DNA sequencing is the process of determining the precise order of a DNA molecule. There are four DNA base pairs - adenine, guanine, cytosine and thymine - but many different permutations that are possible, making sequencing no easy task.
All living species and viruses containing DNA - including animals, plants, bacteria, and archaea - may have their DNA sequenced. From these organisms, we are able to extract DNA from genes, chromosomes, entire genomes, and mitochondria.
Elucidation of the DNA sequences has provided scientists with a wealth of information.
Furthermore, entire fields have emerged from the ability to view DNA sequences. Patient diagnoses, biotechnology, forensic biology, virology and biological systematics are just a few of the fields that have either emerged or further developed due to the advent of DNA sequencing.
There are two types of DNA sequencing performed: de novo and resequencing.
We've come a long way since the first generation of DNA sequencing in the 1970's. The first human genome sequenced in 2003 took nearly a decade and cost $3bn. As of the year 2015, sequencing an entire human genome takes a little less than $1,000 and a matter of a few days.
Sanger sequencing was one of the early DNA sequencing techniques used. The method was primarily based on capillary electrophoresis. However, even with automation and optimization, it was found to be too slow and costly. Thus, new techniques emerged that involved cyclic methods, where dNTPs were added consecutively and in massive parallelization. The methods that incorporated such techniques fell under a family of techniques known as Next Generation Sequencing.
Massive parallilization made it possible to process thousands to millions of sequences concurrently. This resulted in data output increasing at a rate that exceeded Moore's law, more than doubling each year since its inception.
Not only did NGS bring about a wealth of information, but it also uncovered new scientific ideas and revolutionized the way we worked in life sciences.
In this series, we'll go through each DNA sequencing technique. Let's begin with one of the first sequencing techniques that came about in the 70's.
Before the popular Sanger sequencing came about, there were two DNA sequencing methods introduced by Alan Maxam and Walter Gilbert in 1973 and 1976.
The first is known as the wandering-spot analysis, which reported sequence of a *whopping* 24 base pairs.
The second, more effective yet still limiting method used chemical sequencing. This means it used chemical processes to terminate DNA strands. These fragment DNA pieces were then run through a gel to resolve the sequence order.
Maxam-Gilbert sequencing was at one point more popular than the Sanger method. Purified DNA could be used directly, while the Sanger method required that each read start be cloned for production of single-stranded DNA.
Cons included difficulties scaling up, and the handling of X-rays and radiolabeling, which were harmful to technicians.
Gilbert, W., and A. Maxam. "The Nucleotide Sequence of the Lac Operator." Proceedings of the National Academy of Sciences 70.12 (1973): 3581-584. Web.
Sanger sequencing was developed by Frederick Sanger and his colleagues in 1977. The development of this technique won Sanger the Nobel Prize in Chemistry in 1980.
From the 80's to the mid-2000's, Sanger sequencing dominated the DNA sequencing platform, bringing successful completion of the Human Genome Project (HGP) in 2003. Although this technique has been replaced by next generation sequencing methods, it is still used today for smaller-scale projects.
A dideoxynucleotide (ddNTP) is an artifical molecule that lacks a hydroxyl group at both the 2' and 3' carbons of the sugar moiety. Compare this to a regular deoxynucleotide triphosphate (dNTP), which has the hydroxyl group on the 3' sugar.
The main purpose of the 3'-OH group is that it is used to form a phosphodiester bond between two nucleotides - this is what allows for a DNA strand to elongate.
During DNA replication, an incoming nucleoside triphosphate is linked by its 5' α-phosphate group to the 3' hydroxyl group of the last nucleotide of the growing chain. With ddNTP, where there is no 3' hydroxyl group, this reaction cannot take place, so elongation is terminated.
Here is an image of how DNA elongation regularly occurs (with dNTP instead of ddNTP).
Now that we have seen the chemistry behind the ddNTP, let's look at how Sanger Sequencing works.
There are three main steps in Sanger Sequencing, as outlined below.
The first step is to fragment the DNA and clone the fragments into vectors.
The second step is to anneal a synthetic oligonucleotide with length 17 to 24-mer. (An oligonucleotide is just a fancy name for a short strand of DNA). The oligonucleotide acts as a binding site for a primer and provides a 3' hydroxyl group, which is necessary to initiate DNA synthesis.
In order to recognize the sequence and identify precisely the first nucleotide of the target DNA, the primer is usually positioned 10 to 20 nucleotides away from the target DNA.
Four different reaction vials are made, each with the four standard dNTP's, and DNA polymerases.
The difference among the vials are the type of ddNTPs. Each vial will have 1 ddNTP per 100 dNTP.
After DNA synthesis occurs, each reaction vial will have a unique set of single-stranded DNA molecules of varying lengths. However, all DNA molecules will have the same primer sequence at its 5' end.
The resulting DNA fragments are then denatured by heat since base-paired loops of ssDNA may cause difficulty in resolving bands when running a gel. Additionally, one may add formamide to prevent base pairing.
Now that we have varying sequences, we need to line them up according to size to determine the sequence.
Here, the ddNTPs would have to be radioactively or fluorescently labeled beforehand for automated sequencing machines. The DNA strands are then separated using gel electrophoresis, then read from top to bottom (3' to 5') to obtain the sequence.
We could have fluorescently labeled each ddNTP to use dye-terminating sequencing instead. This causes each of the four ddNTPs to emit light at different wavelengths. Here, we capillary electrophoresis, with a single lane to capture the nucleotide sequence.
The Sanger method is fast, reliable and accurate, but is limited to its short reads of around 500 nucleotides per run. In order to extend the amount of reads, we can use a technique called primer walking.
In Sanger sequencing, we attached a primer about 10-20 base pairs below the start of the target sequence. Since our strand terminates at around 500 nucleotides, any sequences longer cannot be read.
To get around this, we add a second primer that is around 10-20 base pairs upstream of the termination of our first sequence. We can then sequence the next ~500 base pairs, and repeat this process until the entire cloned DNA is sequenced.
In this diagram, we can see that we added four primers - P1, P2, P3, and P4.
Upstream simply means up the path that transcription acts on. So in our diagram above, upstream would be to the left. Downstream means down the stream as transcription occurs, so in our diagram this would be to our right.
To avoid any ambiguity, both strands of DNA are sequenced to double-check our work. Additionally, the reaction vessel is kept at stringent annealing conditions to avoid any spurious binding of nonidentical sequences. Furthermore, primers are ensured to be at least 24 nucleotides long to avoid having them bind to the same region.
Instead of promoting irreversible primer extension like the Sanger method, the reversible chain terminators method uses a cyclic method that consists of nucleotide incorporation, fluorescence imaging and cleavage. The figure below shows a modified nucleotide with a cleavable dye and reversible blocking group. Once the blocking group is removed, a new nucleotide may come in.
The steps for such a process can be outlined as follows:
There are some limitations to this method which include:
Shotgun sequencing is a type of de novo sequencing, meaning it can assemble an entire genome that has not yet been sequenced before.
Shotgun sequence is used to analyze DNA sequences longer than 1000 base pairs, up to entire chromosomes. The basic methodology is to break up multiple sequences of the same genome in various places, and reassemble them based on overlapping regions.
Statistically speaking, there are chances of false contigs coming up. This occurs when the assembler finds overlapping segments that occurred by chance. This may be corrected by paired-ends or mate-pairs sequencing.
Additionally, transfecting bacteria cells can take a long time.
Now that you've learned about the most basic DNA sequencing techniques, let's learning about Next Generation Sequencing Techniques, the technological leap which made the $1000 genome a reality!