DNA Sequencing Techniques
An Introduction to DNA Sequencing
Introduction to DNA sequencing
DNA sequencing is the process of determining the precise order of a DNA molecule. There are four DNA base pairs - adenine, guanine, cytosine and thymine - but many different permutations that are possible, making sequencing no easy task.
What can be sequenced?
All living species and viruses containing DNA - including animals, plants, bacteria, and archaea - may have their DNA sequenced. From these organisms, we are able to extract DNA from genes, chromosomes, entire genomes, and mitochondria.
What can you do with a DNA sequence?
Elucidation of the DNA sequences has provided scientists with a wealth of information.
- Geneticists are now able to understand the function of genes by finding distinctive coding regions such as DNA-binding sites, receptor recognition sites and transmembrane domains.
- Scientists have been able to better predict homology among species. Evolutionary biologists may describe how organisms are related.
- Doctors can give a more personalized approach to medicine, tailoring therapeutics depending on a person's genetic makeup. Genetic testing such as paternal or prenatal testing is becoming more and more commonplace.
- Criminal investigators can use DNA profiling to identify suspects, or exonerate the accused.
- Metagenomics, the study of genetic material recovered directly from environmental samples, allow us to identify organisms present in bodies or water, sewage, dirt, etc.
Furthermore, entire fields have emerged from the ability to view DNA sequences. Patient diagnoses, biotechnology, forensic biology, virology and biological systematics are just a few of the fields that have either emerged or further developed due to the advent of DNA sequencing.
Two types of DNA sequencing
There are two types of DNA sequencing performed: de novo and resequencing.
- In de novo sequencing, the DNA is sequenced for the first time. This means there are no reference genomes available to align reads to.
- Resequencing, on the other hand, the sequences have a reference genome.
DNA sequencing approaches
We've come a long way since the first generation of DNA sequencing in the 1970's. The first human genome sequenced in 2003 took nearly a decade and cost $3bn. As of the year 2015, sequencing an entire human genome takes a little less than $1,000 and a matter of a few days.
Sanger sequencing was one of the early DNA sequencing techniques used. The method was primarily based on capillary electrophoresis. However, even with automation and optimization, it was found to be too slow and costly. Thus, new techniques emerged that involved cyclic methods, where dNTPs were added consecutively and in massive parallelization. The methods that incorporated such techniques fell under a family of techniques known as Next Generation Sequencing.
Next Generation Sequencing
Massive parallilization made it possible to process thousands to millions of sequences concurrently. This resulted in data output increasing at a rate that exceeded Moore's law, more than doubling each year since its inception.
Not only did NGS bring about a wealth of information, but it also uncovered new scientific ideas and revolutionized the way we worked in life sciences.
In this series, we'll go through each DNA sequencing technique. Let's begin with one of the first sequencing techniques that came about in the 70's.
Maxam-Gilbert Chemical Sequence Method
Before the popular Sanger sequencing came about, there were two DNA sequencing methods introduced by Alan Maxam and Walter Gilbert in 1973 and 1976.
The first is known as the wandering-spot analysis, which reported sequence of a *whopping* 24 base pairs.
The second, more effective yet still limiting method used chemical sequencing. This means it used chemical processes to terminate DNA strands. These fragment DNA pieces were then run through a gel to resolve the sequence order.
- Denature a double-stranded DNA to single-stranded by increasing temperature.
- Radioactively label one 5' end of the DNA fragment to be sequenced by a kinase reaction using gamma-32P.
- Cleave DNA strand at specific positions using chemical reactions. For example, we can use one of two chemicals followed by piperdine. Dimethyl sulphate selectively attacks purine (A and G), while hydrazine selectively attacks pyrimidines (C and T). The chemical treatments outlined in Maxam-Gilbert's paper cleaved at G, A+G, C and C+T. A+G means that it cleaves at A, but occasionally at G as well.
- Now in four reaction tubes, we will have several differently sized DNA strands.
- Fragments are electrophoresed in high-resolution acrylamide gels for size separation.
- These gels are placed under X-ray film, which then yields a series of dark bands which show the location of radiolabeled DNA molecules. The fragments are ordered by size and so we can deduce the sequence of the DNA molecule.
Maxam-Gilbert sequencing was at one point more popular than the Sanger method. Purified DNA could be used directly, while the Sanger method required that each read start be cloned for production of single-stranded DNA.
Cons included difficulties scaling up, and the handling of X-rays and radiolabeling, which were harmful to technicians.
Gilbert, W., and A. Maxam. "The Nucleotide Sequence of the Lac Operator." Proceedings of the National Academy of Sciences 70.12 (1973): 3581-584. Web.
Sanger sequencing Chain-termination method
Sanger sequencing was developed by Frederick Sanger and his colleagues in 1977. The development of this technique won Sanger the Nobel Prize in Chemistry in 1980.
From the 80's to the mid-2000's, Sanger sequencing dominated the DNA sequencing platform, bringing successful completion of the Human Genome Project (HGP) in 2003. Although this technique has been replaced by next generation sequencing methods, it is still used today for smaller-scale projects.
What is dideoxynucleotide?
A dideoxynucleotide (ddNTP) is an artifical molecule that lacks a hydroxyl group at both the 2' and 3' carbons of the sugar moiety. Compare this to a regular deoxynucleotide triphosphate (dNTP), which has the hydroxyl group on the 3' sugar.
The main purpose of the 3'-OH group is that it is used to form a phosphodiester bond between two nucleotides - this is what allows for a DNA strand to elongate.
DNA elongation cannot occur with ddNTPs
During DNA replication, an incoming nucleoside triphosphate is linked by its 5' α-phosphate group to the 3' hydroxyl group of the last nucleotide of the growing chain. With ddNTP, where there is no 3' hydroxyl group, this reaction cannot take place, so elongation is terminated.
Here is an image of how DNA elongation regularly occurs (with dNTP instead of ddNTP).
Now that we have seen the chemistry behind the ddNTP, let's look at how Sanger Sequencing works.
There are three main steps in Sanger Sequencing, as outlined below.
1) Clone DNA strands into a vector
The first step is to fragment the DNA and clone the fragments into vectors.
2) Attach primer
The second step is to anneal a synthetic oligonucleotide with length 17 to 24-mer. (An oligonucleotide is just a fancy name for a short strand of DNA). The oligonucleotide acts as a binding site for a primer and provides a 3' hydroxyl group, which is necessary to initiate DNA synthesis.
In order to recognize the sequence and identify precisely the first nucleotide of the target DNA, the primer is usually positioned 10 to 20 nucleotides away from the target DNA.
2) Add four dNTPs + 1 ddNTP
Four different reaction vials are made, each with the four standard dNTP's, and DNA polymerases.
The difference among the vials are the type of ddNTPs. Each vial will have 1 ddNTP per 100 dNTP.
After DNA synthesis occurs, each reaction vial will have a unique set of single-stranded DNA molecules of varying lengths. However, all DNA molecules will have the same primer sequence at its 5' end.
The resulting DNA fragments are then denatured by heat since base-paired loops of ssDNA may cause difficulty in resolving bands when running a gel. Additionally, one may add formamide to prevent base pairing.
3) Find sequence by gel electrophoresis
Now that we have varying sequences, we need to line them up according to size to determine the sequence.
Here, the ddNTPs would have to be radioactively or fluorescently labeled beforehand for automated sequencing machines. The DNA strands are then separated using gel electrophoresis, then read from top to bottom (3' to 5') to obtain the sequence.
We could have fluorescently labeled each ddNTP to use dye-terminating sequencing instead. This causes each of the four ddNTPs to emit light at different wavelengths. Here, we capillary electrophoresis, with a single lane to capture the nucleotide sequence.
- Not as toxic and less radioactivity than Maxam and Gilbert method.
- Easier to automate - Leroy Hood and coworkers used fluorescently labeled ddNTPs and primers for the first high-throughput DNA sequencing machine. This lowered the cost from $100 million to $10,000 USD in 2011.
- Highly accurate long sequence reads of about 700 base pairs.
- Easier to get started. The kits that are commercially available contain reagents necessary for sequencing - pre-aliquoted and ready to use.
- Poor quality in the first 15-40 bases of the sequence. This is due to primer binding and deteriorating quality of sequencing traces after 700-900 bases.
- Time consuming, especially due to requirement for electrophoretic separation of fragments. Expensive due to relatively large volumes of chemicals that are used.
- DNA fragments cloned before sequencinging - read may include parts of the cloning vector.
- Only short 300-1000 nucleotides long DNA fragments in a single reaction. Problem with reading strands longer is the insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide.
- To elongate these reads, we may use a technique known as Primer Walking, which we will see in the next lesson.
The Sanger method is fast, reliable and accurate, but is limited to its short reads of around 500 nucleotides per run. In order to extend the amount of reads, we can use a technique called primer walking.
In Sanger sequencing, we attached a primer about 10-20 base pairs below the start of the target sequence. Since our strand terminates at around 500 nucleotides, any sequences longer cannot be read.
To get around this, we add a second primer that is around 10-20 base pairs upstream of the termination of our first sequence. We can then sequence the next ~500 base pairs, and repeat this process until the entire cloned DNA is sequenced.
In this diagram, we can see that we added four primers - P1, P2, P3, and P4.
Upstream vs. downstream
Upstream simply means up the path that transcription acts on. So in our diagram above, upstream would be to the left. Downstream means down the stream as transcription occurs, so in our diagram this would be to our right.
To avoid any ambiguity, both strands of DNA are sequenced to double-check our work. Additionally, the reaction vessel is kept at stringent annealing conditions to avoid any spurious binding of nonidentical sequences. Furthermore, primers are ensured to be at least 24 nucleotides long to avoid having them bind to the same region.
Reversible chain terminators
Instead of promoting irreversible primer extension like the Sanger method, the reversible chain terminators method uses a cyclic method that consists of nucleotide incorporation, fluorescence imaging and cleavage. The figure below shows a modified nucleotide with a cleavable dye and reversible blocking group. Once the blocking group is removed, a new nucleotide may come in.
The steps for such a process can be outlined as follows:
- Have four dNTP's, each with a different fluorescent marking. These markings should not interfere with base pairing or phosphodiester bond formation.
- Each dNTP should terminate DNA elongation temporarily with a blocking group on the 3' carbon of the sugar moiety.
- Upon each cycle, have just one dNTP bind to the elongating strand and emit a fluorescent dye color.
- Depending on the color emitted, record the particular nucleotide.
- Cleave the blocking group and fluorescent dye with a palladium-catalyst.
- Restore a 3' hydroxyl so that the growing strand can now elongate.
- Repeat from step 1.
There are some limitations to this method which include:
- Incomplete cleavage of blocking groups.
- Difficulties incorporating fluorescent nucleotides.
- Only up to 36 nucleotides per run.
Shotgun sequencing is a type of de novo sequencing, meaning it can assemble an entire genome that has not yet been sequenced before.
Shotgun sequence is used to analyze DNA sequences longer than 1000 base pairs, up to entire chromosomes. The basic methodology is to break up multiple sequences of the same genome in various places, and reassemble them based on overlapping regions.
- Genomic DNA is fragmented by sonification or hydrodynamic shearing.
- All sticky-end fragments are blunt ended with T4 DNA polymerase and exonuclease activity.
- T4 polynucleotide kinase is added so that 5' ends are phosphorylated.
- Fragments seaprated into either small (~1kb), medium (~8kb) and large (~40kb) fragments.
- A library is created per each size in plasmids and transformed into E. coli cells.
- Vector DNA is purified from each library and amplified.
- Each DNA strand is sequenced (can attach a primer upstream of our vector, then use any sequencing by synthesis method).
- Computer program called a base caller filters out poor calls.
- The assembler finds overlapping segments and generates long successive continguous stretches of nucleotides, called contigs.
Statistically speaking, there are chances of false contigs coming up. This occurs when the assembler finds overlapping segments that occurred by chance. This may be corrected by paired-ends or mate-pairs sequencing.
Additionally, transfecting bacteria cells can take a long time.
Now that you've learned about the most basic DNA sequencing techniques, let's learning about Next Generation Sequencing Techniques, the technological leap which made the $1000 genome a reality!