Since the discovery of the structure of DNA in 1953 by James Watson and Francis Crick, scientists have worked toward achieving an efficient and cost-effective way to sequence DNA. Sequencing is extremely important, as it has numerous applications in diagnosis, biotechnology, and drug discovery, as well as many other uses. This essay will review the history of important DNA sequencing technologies, which have evolved and continue to evolve quite rapidly in recent years.
There were several difficulties to overcome in determining DNA sequence compared to protein sequence, as the chemical properties of the 4 DNA bases (adenine, cytosine, guanine and thymine), are relatively similar, whereas the chemical properties of the 20 amino acids vary widely. In addition, DNA molecules are much longer than polypeptide chains, it was not yet known how to purify and separate different strands of DNA, and there were no known base-specific DNases, which would have allowed for a method analogous to protein sequencing with different proteases. Interestingly, some RNA molecules did not share those deficiencies. For instance, tRNAs were small and could be purified, and there were known RNases with different base specificity, which allowed for methods similar to protein sequencing. Thus, in 1965, Robert Holley was able to elucidate the nucleotide sequence of an alanine tRNA isolated from yeast (Holley et al., 1965). The late 50's and early 60's led to another breakthrough: purification of viral DNA was achieved by Sinsheimer in 1959 (SINSHEIMER, 1959) and later by Kaiser and Hogness in 1960 (KAISER and HOGNESS, 1960).
Get your grade
or your money back
using our Essay Writing Service!
The next major breakthrough in DNA sequencing came when Kaiser and Wu were able to use incorporation of radiolabeled nucleotides to determine a short partial sequence in Escherichia coli (Wu and Kaiser, 1968). Unfortunately, this method applied only to shorter strands of DNA, and only from lambda and other related bacteriophage genomes. Another key innovation was the discovery of type II restriction enzymes by Hamilton Smith and colleagues (Smith and Wilcox, 1970), which cut DNA at specific palindromic sequences of DNA (usually 4-6 base pairs in length). These restriction enzymes had the ability to cut up long pieces of DNA into shorter, more manageable pieces that could be separated using gel electrophoresis. This technique led to the first major nucleic acid sequencing using 2-dimensional chromatography.
Frederick Sanger used this technique to introduce his "plus and minus" method for DNA sequencing. The plus and minus method took advantage of more advanced polyacrylamide (PAGE) gels, which could separate products of DNA synthesis according to chain length, and could differentiate between chains that differed only by one base pair in length. Sanger used primers to perform DNA synthesis that resulted in products of varying lengths, which each terminated in a 32P labeled nucleotide. These products were then divided into 8 groups and were used as primers in a second round of DNA synthesis. Next, synthesis was terminated in a way that was sequence-specific by supplying only one of the nucleotide bases (plus) or by supplying only three of the four nucleotides (minus method). Following this, PAGE gels were run, and molecules of differing length could be seen. In this way, sequences of approximately 50 base pairs could be determined all at once (Sanger and Coulson, 1975). One of the drawbacks to this technique, though, was that it was difficult to determine the difference in length for homopolymer runs (e.g. AAAAA etc.), since only the beginning and the end of those runs produced bands in the gel.
This issue was solved in 1977 when Sanger and Coulson came up with the dideoxy chain termination method (Sanger et al., 1977), (known as Sanger sequencing) which is still widely used today. Instead of using one or three of the four nucleotides to terminate the DNA chain, he used nucleotide analogs (dideoxy nucleotide triphosphates) that are incapable of incorporating additional bases (since they lack a 3' OH group). The classical chain-terminator method uses a single-stranded DNA template, a DNA primer, DNA polymerase, radiolabeled or fluorescently labeled nucleotides and ddNTPs. The DNA sample is divided into 4 reactions each of which contain the standard dNTPs, DNA Polymerase and one of the four ddNTPs. This technique allowed even homopolymer runs to be sequenced correctly, and lengths of approximately 100 nucleotides could be read off of PAGE gels (each reaction would be run in different lanes of the same gel).
Always on Time
Marked to Standard
A new variant of the Sanger method used ddNTPs that were each labeled with a different fluorescent dye, which allowed sequencing to occur in a single reaction, rather than in 4 different ones. This technique was developed by Leroy Hood at Caltech in concordance with Applied Biosystems (ABI) in 1986 (Smith et al., 1986). These dye-terminating sequencing techniques became the dominant sequencing technology until about 2005. The obtainable length of a sequence using dye-termination is approximately 1000 nucleotides. Very shortly after this paper was published, an automated sequencer (the ABI 370A DNA sequencer) was developed and first used in 1987 by Craig Venter to sequence a gene (Gocayne et al., 1987). Automated sequencing led to the use of the expressed sequence tag (EST) approach to gene discovery. In EST, random cDNA copies of mRNA are cloned at random and then put through automated sequencing, which led to the discovery of several novel human genes (Adams et al., 1991). This approach was used by many genome projects, and today the EST database contains over 43 million ESTs from over 1300 different organisms. Until 1995, the only sequences of DNA that were completely sequenced were from viral and organelle genomes (which were relatively small - below 1 Mb).
Venter was able to introduce several major improvements that allowed for sequencing of larger genomes, starting with Haemophilus influenzae (1.83Mb) (Fleischmann et al., 1995) using "whole genome shotgun" (WGS) method of sequencing cellular genomes. In this technique, genomic DNA is fragmented randomly and cloned to produce a random library in E. coli. Clones are sequenced randomly and assembled together to produce the complete genome sequence using a computer program that compares all the sequence reads and aligns matching sequences. In conjunction with this, Venter also developed a 'paired ends' strategy or a pairwise end sequencing strategy, in which the 5' and 3' ends of a double-stranded DNA fragment are used to map reads uniquely (Edwards et al., 1990). Randomly sheared DNA was sized before being cloned, and the distance between reads from the ends of each clone could be determined. This information was used to create scaffolds from overlapping sequence (contigs), and when two contigs contained sequences from opposite ends of a single clone, then the two could be linked. This WGS technique led to the sequencing of E. coli, Saccharomyces cerevisiae, B. subtilis, C. elegans, Arabidopsis thaliana and of course, eventually, the human genome.
Beginning in 2005, next generation sequencing techniques were developed, which feature massively parallel reads - that is, a much larger number of reads can be obtained compared to the 96 that can be achieved with modern electrophoresis-based Sanger sequencers. These have also shown improvements in speed, efficiency, cost and accuracy. The first commercially available massively parallel method was developed by 454 Life Sciences and uses the pyrosequencing technique (Nyrén et al., 1993), which instead of using dideoxy chain termination, measures pyrophosphate release upon nucleotide incorporation using luminometric detection. One drawback is that when estimating homopolymer runs, run length must be estimated depending on how much pyrophosphate is released, which can lead to single-base insertions and deletions (indels). Another key innovation is the Illumina (or Solexa) sequencing technology (Bennett, 2004). Instead of using pyrosequencing, this uses chain-terminating nucleotides in a reversible process which leads to fewer indels within runs (relative to 454). Drawbacks to the Illumina platform are that the read lengths are usually shorter than for 454 and there are more SNP errors because of the modified polymerase and dye terminator nucleotides. One of the most recent innovations in sequencing is the Pacific Biosciences SMRT DNA sequencing, which allows for very long reads and fast cycle times, and can allow for detection of methylation. Because of the longer reads, whole genome sequencing can be performed more easily and accurately (English et al., 2012).