This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
In 1977, Fred Sanger developed a DNA sequencing method to decode the human genome which also was termed as dideoxy sequencing or chain termination. To decode the sequence of A, T, C and G nucleotide in a piece of DNA, the template DNA is copied repeatedly. The copying reactions stop when modified nucleotides called chain terminations are added. The copies vary in length, and by analyzing these fragments the original sequence can be read. The reaction is initiated at either end of the target DNA. First, the two DNA strands are separated. A short piece of DNA- a 'primer' that is complementary to a known sequence, then binds to the template. An enzyme called DNA polymerase then binds to the primer, and starts to make a new strand of DNA by incorporating free nucleotides that are complementary to the target DNA. Then enzyme continues to extend the strand until it randomly incorporates a fluorescently labeled nucleotide. The fluorescently labeled nucleotides are chemically altered so that they terminate the DNA strand and the enzyme falls away. This is repeated many times, generating a large number of fragments of different lengths that end in fluorescently labeled bases. At this stage, the fragments are floating freely in one of many tiny wells in a plate. To sort the fragments by size, and detect which bases have been added, when the plate is located onto a sequence machine. Inside the machine, the samples are transferred into the glass capillaries where an electrical charges starts the negatively charged DNA molecules moving through a gel matrix. As they move, the longer DNA fragments are slowed down more by the gel than the shorter fragments. Therefore, when all DNA fragments from the sequencing reaction are run through the capillary, they are sorted by size, from shorter to the longest and a laser at the end of the capillary is used to excite the final fluorescent base, which is recorded as a coloured peak or bar. Each coloured bar therefore represents the final base of each strand of DNA that was made in the sequencing reaction, ordered from shorter to longest. The sequence of the original piece of DNA can therefore be decoded. Each reaction typically produces 500-800 letters of DNA sequence.
Sanger's method, which is also referred to as dideoxy sequencing or chain termination, is based on the use of dideoxynucleotides (ddNTP's) in addition to the normal nucleotides (NTP's) found in DNA. Dideoxynucleotides are essentially the same as nucleotides except They contain a hydrogen group on the 3' carbon instead of a hydroxyl group (OH). These modified nucleotides, when integrated into a sequence, prevent the addition of further nucleotides. (Speed, 1992).This occurs because a phosphodiester bond cannot form between the dideoxynucleotide and the next incoming nucleotide, and thus the DNA chain is terminated. Sanger sequencing is referred to the first generation sequencing technology, over the next several decades, technical advances automated, dramatically sped up, and further refined the Sanger sequencing process. Also called the chain-termination or dideoxy method, Sanger sequencing involves using a purified DNA polymerase enzyme to synthesize DNA chains of varying lengths. The key feature of the Sanger method's reaction mixture is the inclusion of dideoxynucleotide triphosphates (ddNTPs). These chain-terminating dideoxynucleotides lack the 3' hydroxyl (OH) group needed to form the phosphodiester bond between one nucleotide and the next during DNA strand elongation. Thus, when a dideoxynucleotide is incorporated into the growing strand, it inhibits further strand extension. The result of many of these reactions is a number of DNA fragments of varying length. These fragments are then separated by size using gel or capillary tube electrophoresis, a method in which an electric field pulls molecules across a gel substrate or hairlike capillary fiber. This procedure is sensitive enough to distinguish DNA fragments that differ in size by only a single nucleotide.
With the Sanger process, four parallel sequencing reactions are used to sequence a single sample. Each reaction involves a single-stranded template, a specific primer to start the reaction, the four standard deoxynucleotides (dATP, dGTP, dCTP, and dTTP), and DNA polymerase. The polymerase adds bases to a DNA strand that is complementary to the single-stranded sample template. One of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) is then added to each reaction at a lower concentration than the standard deoxynucleotides. Because the dideoxynucleotides lack the 3â€² OH group, whenever they are incorporated by DNA polymerase, the growing DNA terminates. Four different ddNTPs are used such that the chain doesn't always terminate at the same nucleotide (i.e., A, G, C, or T). This produces a variety of strand lengths for analysis. Then, by putting the resulting samples through four columns on a gel (according to which dideoxynucleotide was added), researchers can see the fragments line up by size and know which base is at the end of each fragment. This makes the DNA sequence simple to read. As a commonly used technology in molecular biology, DNA sequencing technology has been seen significant advances for more than thirty years since 1970s (Kahvejian et al, 2008). In 1977, Sanger developed one sequencing method, which can be regarded as the first sequencing technology (Kahvejian et al, 2008), based on enzymatic synthesis from a single-stranded DNA template with chain termination utilizing dideoxynucleotides (Sanger et al, 1977). Meanwhile, Maxim and Gilbert reported another method through chemical degradation of end-radio-labeled DNA fragments (Maxim and Gilbert, 1977). Although there is a distinct difference between the two methods in terms of theory, both methods are on the basis of 'four-lane, high-resolution polyacrylamide gel electrophoresis' to separate the labeled fragment and allow the base sequence to be read in a staggered lad-der-like fashion (Craham et al. 2001, p1). These two prominent efforts were responsible for the introduction of the first automated DNA sequencers led by Caltech (Smith et al, 1986), which was subsequently commercialized by Applied Biosystems (ABI), the European Molecular Biology Laboratory (EMBL) and Pharmacia Amersham (Pareek et al, 2011). The commercialization of the sequencing method contributes to its booming development worldwide.
In the first automated fluorescent DNA sequencing equipment, a complete gene locus for the hypoxanthineguanine phosphoribosyltransferase (HPRT) gene was sequenced, using for the first time the paired-end sequencing approach (Edwards et al, 1990). In 1996, ABI introduced the first commercial DNA sequencer that utilized a slab gel electrophoresis by the ABI Prism 310 (Nyren, 2007). Two years later, the considerable labor of pouring slab gels was replaced with automated reloading of the capillaries with polymer matrix by ABI Prism 3700 with 96 capillaries (Nyren, 2007). This automated DNA sequencer was successfully applied into the sequencing of the HGP. In the following years, another landmark was achieved by the DNA sequencing of the first small phage genome which was 5386 bases in length and sequencing of the human genome of up to 3 billion bases (Lander et al. 2001). It is remarkable that such progress has been made using methods that are refinements of the basic 'dideoxy' method introduced by Sanger in 1977.
Hybridization to tiling arrays
The concept of allele-specific hybridization (ASH) has been used for resequencing and genotyping purposes by expanding a probe set, targeting a specific position in the genome, to include interrogation of each of the four possible nucleotides . A tiling array can be fabricated with probe sets targeting each position in the reference genome. Read length is given by the probe length (often 25 bp) and base calling is performed by examining the signal intensities for the different probes of each set. Accuracy is an issue and is dependent on the ability of the assay to discriminate between exact matches and those with a single base difference. Performance may vary significantly due to different base compositions (different thermal annealing properties) of different regions, resulting in problems with false positives as well as with large inaccessible regions composed of repetitive sequence stretches [36,37]. The throughput is an obvious benefit, since all bases are interrogated simultaneously and the concept has been applied to resequencing the human chromosome 21 by Perlgen  and HIV . By representing all possible sequences for a given probe length, de novo sequencing can be performed and overlapping sequences used for sequence assembly . In a recent report, the genome of Bacteriophage Î» and Escherichia coli were resequenced by "shotgun sequencing by hybridization" with an accuracy of 99.93% and a raw throughput of 320 Mbp/day .
The Genome Sequencer FLX by 454 Life Sciences  and Roche depends on an emulsion PCR followed by parallel and individual Pyrosequencing of the clonally amplified beads in a PicoTiterPlate (see Fig. 1B). Emulsion PCR is a clonal amplification performed in an oilaqueous emulsion. Unlike when digesting a genome with restriction endonucleases, shearing will provide randomly fragmented pieces of more or less similar length. By the addition of general adaptor sequences to the fragments, only one primer pair is required for amplification. In the emulsion PCR, a primer-coated bead, a DNA fragment and other necessary components for PCR (including the second general primer) are isolated in awater micro-reactor, favoring a 1:1 bead to fragment ratio. Once the emulsion is broken, beads not carrying any amplified DNA are removed in an enrichment process [12,40]. The amplified and enriched beads are then distributed on the PicoTiterPlate, where a well (44 Î¼min diameter) allows fixation of one bead (28 Î¼m in diameter) . However, out of the 1.6 million wells, not all will contain a bead and not all of those that do will give a useful sequence. Following the distribution of the DNA-carrying beads to the PicoTiterPlate Pyrosequencing will be performed. Pyrosequencing is a sequencing-by-synthesis method where a successful nucleotide incorporation event is detected as emitted photons . Since the single-stranded DNA fragments on the beads have been amplified with general tags, a general primer is annealed permitting an elongationtowards the bead. The emission of photons upon incorporation depends on a series of enzymatic steps. Incorporation of a nucleotide by a polymerase releases a diphosphate group (PPi), which catalyzed by ATP sulphurylase forms adenosine triphosphate (ATP) by the use of adenosine phosphosulphate (APS). Finally, the enzyme luciferase (together with D-luciferin and oxygen) can use the newly formed ATP to emit light. Another enzyme, apyrase, is used for degradation of unincorporated dNTPs aswell as to stop the reaction by degrading ATP . In the 454 system, the Pyrosequencing technology is adapted as follows. The enzymes luciferase and ATP sulphurylase are immobilized on smaller beads surrounding the larger amplicon carrying beads. All other reagents are supplied through a flow allowing reagents to diffuse to the templates in the PicoTiterPlate. Polymerase and one exclusive dNTP per cycle generate one or more incorporation events and the emitted light is proportional to the number of incorporated nucleotides. Photons are detected by a CCD camera and after each round, apyrase is flowed through in order to degrade excess nucleotides. The washing procedure for the removal of byproducts permits read lengths of over 400 bp (250 bp in the GS FLX system and over 400 bp in the recently upgraded instrument, the GS FLX Titanium). This limitation is due to negative frame shifts (incorporation of nucleotides in each cycle is not 100% complete) and positive frame shifts (the population of nucleotides that is not fully degraded by the apyrase and can therefore be incorporated after the next nucleotide) that eventually will generate high levels of noise. Approximately 1.2 million wells will give one unique sequence of 400 bp, on average generating less than 500 million bases (Mb) in one single run. Whole-genome sequencing has been performed on bacterial genomes in single runs . An oversampling of 20Ã- permits the identification of PCR-introduced errors and to call homopolymeric errors . 454 Life Sciences is a competitor in the Archon X PRIZE and by moving the parallel Pyrosequencing technology onto a microchip  they believe the system will achieve the "scalability it needs to win".
The Illumina 1G Genome Analyzer is relying on clonal bridge amplification on a flow cell surface generating 10 million singlemolecule clusters per square centimeter. Bridge amplification is performed after immobilization of oligonucleotides complementary to the adaptor sequences on a surface [15,43,44]. Sheared and adaptorligated sample DNA fragments can be attached to the solid support and due to the dense lawn of adaptor complementary sequences on the surface, each will anneal to a nearby primer. A double stranded bridge will form after elongation, and denaturing will free the two strands, both now fixed on the surface. Repeated cycles will form colony like local clusters, each containing approximately 1000 copies and with a diameter of about 1 Î¼m (see Fig. 1C). Sequencing is then carried out with fluorescently labeled nucleotides that are also reversible terminators. One base is incorporated and interrogated at a time since further elongation of the chain is prevented .When all colonies are scanned at the end of a cycle and the base determined for each colony, the fluorophores are cleaved off and terminating bases are activated, allowing another round of nucleotide incorporation (see Fig.1C). The presence of and competition among all four nucleotides is claimed to reduce the chance of misincorporation. Incomplete
incorporation of nucleotides and insufficient removal of reverse terminators or fluorophores may be the explanation for the relatively short read length of 35 bases. Although shorter read lengths than the 454 system, the throughput is much higher and, as of February 2008, 1.5 Gbp are generated in each run, which takes approximately 3 days. The use of paired-end libraries will generate about 3 Gbp in a single run. The raw accuracy is said to be at 98.5% and the consensus (3Ã- coverage) at 99.99%. The cost per base is approximately 1% of the cost for Sanger sequencing [15,45]. A variant of Illumina's sequencing by synthesis chemistry was recently reported where a hybrid of sequencing by synthesis and Sanger method promises longer reads . Ligating degenerated probes Strategies for sequencing-by-ligation have been presented in the form of Massively Parallel Signature Sequencing (MPSS) and Polony sequencing [40,47]. MPSS was demonstrated as signature sequencing of expression libraries of in vitro cloned microbeads, i.e. beads carrying multiple copies of a single DNA sequence . Signature sequencing was carried out by restriction enzyme mediated exposure of four nucleotides in each cycle followed by ligation of an interrogator probe. This process was repeated for 4-5 cycles, i.e. querying 16-20 bases in total. An overhang of four bases would require 256 different complementary probes and just as many fluorophores for immediate recognition. Instead, the use of 16 (4Ã-4) probes, each with a unique decoder binding site, has enabled single dye detection. Resequencing of a bacterial genome was used to demonstrate the Polony sequencing method . A mate-paired library was clonally amplified with an emulsion PCR on 1 Î¼m beads and subsequently immobilized in a polyacrylamide gel. Each DNA-carrying bead (polony) represented two 17-18 bp genomic sequences flanked by different universal sequences. Due to the nature of the mate-pair construction, the two genomic sequences were separated by approximately 1 kb in the genome. Sequencing-by-ligation (see Fig. 1D) could then be performed using degenerate nonamers, where each known nucleotidewas associated with one of four fluorophores. By using four different anchor primers, degenerate sequencing-by-ligation could be performed from each end of the tags. 7 bases could be obtained when sequencing in the 5â€² to 3â€² direction and 6 bases from 3â€² to 5â€². Ligated primers were removed after each round rendering information of 26 bases fromeach amplicon in a pattern of: 7 bases, a gap of 4-5 bases, 6 bases, then a gap of approximately 1 kb (mate-paired constructed) and then another 7 bases, a gap of 4-5 bases, followed finally by 6 bases. These two methods have spawned the development of the commercial SOLiD system (Sequencing by Oligonucleotide Ligation and Detection) from Applied Biosystems where clonal amplicons on 1 Î¼m beads are generated by an emulsion PCR, either from fragments or mate-paired libraries. The beads are enriched, so that 80% of them generate signals, and attached on a glass surface forming a very highdensity random array. Sequencing-by-ligation is performed by ligating 3â€²-degenerated and 5â€²-labeled probes to the amplicons and detecting the color. Accuracy is improved by implementing a two-base encoding system that leads to interrogation of each base twice. A sequencing run takes 6-10 days and the output is high, approximately 3-6 Gbp
per run given a read length of 25-35 bases per clonally amplified bead . An open source implementation  of the Polony sequencing technology is the Danaher Motion Polonator G.007 where 200 such modules will be used by a team competing in the Archon X PRIZE race. They are hoping to reach the $10K per genome during 2008 by further improvements and optimizations of the technology.
Future generation of DNA sequencing technologies
The initial sequencing and mapping of the human genome is estimated to have cost about $3 billion [9,10]. The genome of Craig Venter, determined a year ago , cost around $70 million . Resequencing a human genome with the Sanger sequencing method would today cost approximately $10 million [14,50] while the 454 system enables a 10-fold reduction in cost and about 20-fold reduction in time . Illumina claims to be able to sequence a human genome with the 1G Analyzer for approximately $100,000 . Neither 454 nor Illumina has shown data describing the exact workload and reagent cost but this important information will hopefully be revealed to the scientific community soon. Although the progresses in the last fewyears have shown a significant reduction in sequencing cost, it is still too early and too expensive to use these platforms to routinely sequence human genomes at a larger scale. The realization of the $1000 genome requires novel approaches and there is an immense activity in the field. As mentioned above, in 2004 NHGRI initiated the "Advanced Sequencing Technology Development Projects" where grants were approved for some 20 novel ideas and approaches to develop cuttingedge, low cost sequencing for the future. Today around 35 projects in industry and academia have been granted a total of $56 million for technology development in the quest for the $1000 genome . A key feature among most contenders is to look at single molecules. Although it is challenging to sequence single DNA fragments, there are advantages such as improved read length, since molecules are not getting out of phase, and a significant drop in reagent cost. A number of routes to the future are pursued, such as sequencing-by-synthesis approaches like 454 and Solexa, without the prior amplification step, and indirect approaches using physical recognition of the DNA strand and the investigation of bases using nano pores or equivalents.
The concept of sequencing-by-synthesis without a prior amplification step i.e. single-molecule sequencing is currently pursued by a number of companies. Helicos Biosciences  has an instrument, the HeliScopeâ„¢, with a claimed throughput of 1.1 Gpb per day (as of October 2008 ). Single fragments are labeled with Cy3 for localization of template strands on an array and a predefined, Cy5- labeled nucleotide (for instance "A") are incorporated, detected by a fluorescent microscope and cleaved off in each cycle [55,56]. Four cycles, one for each nucleotide, constitutes a "quad" and multiple quad runs are claimed to produce read lengths of up to ~55 bases (see Fig.
2A). At 20 bases or longer, 86% of the strands are available and at 30 bases, around 50%. The first order of a HeliScope instrument was announced in the beginning of February 2008 and the company claims its machine to be able to sequence a human genome for $72,000. A different, although very promising, approach is taken by Pacific Biosciences . The technology, denoted Single-molecule Real Time
Sequencing-by-synthesis (SMRTâ„¢), has in a proof-of-concept study shown read lengths of single DNA fragments of over 1500 bases in 3000 parallel reactions. The heart of the technology is so called zero-mode waveguides (ZMW)  which essentially are nanometer scale wells with a diameter of 70 nm (see Fig. 2B). Light bulges inward at the opening, permitting illumination of a detection volume of 20 zl (10âˆ’ 21) where a single DNA polymerase is immobilized. Nucleotides, fluorescently labeled at the terminal phosphate, are incorporated by the polymerase and thereby exposing its base-specific fluorophore for a few milliseconds which is enough for detection. Benefits are long read lengths of thousands of bases in one stretch and high speed (10 bases per second and molecule). It is still at the proof-of-concept stage and no commercial instrument is ready. Thousands of ZMWs in parallel may in a future instrument (no sooner than 2010) generate 100 gigabases per hour. A second generation instrument capable of sequencing a human genome for $1000 is an additional number of years in the future. The Menlo Park based company has received grants from the NHGRI but is not signed up for the X PRIZE race so far. Unlike Pacific Biosciences, a contender in the X PRIZE is Visigen Biotechnologies , Houston, TX, which platform consists of an engineered polymerase and modified nucleotides for single-molecule detection. An immobilized polymerase on a surface, modified with a fluorescence resonance energy transfer (FRET) donor incorporates nucleotides modified with different acceptors, allowing base-specific and real time detection of incorporation events (see Fig. 2C). A theoretical throughput of 1 million bases per instrument second has been given, although no proof-of-concept study has been presented. Applied Biosystems has completed an equity investment as of December 2005. Intelligent Biosystems  is pursuing an array-based sequencingby- synthesis approach  similar to the Illumina 1G system and claims launch of an instrument by the end of 2008 that might reduce the cost of a genome to $10,000. The company received a grant in 2006
by NHGRI and is not competing for the X PRIZE. Further in the future may lie sequencing approaches that utilize physical recognition of nucleic bases. One alternative is nano pores, where the aim is to sequence a DNA strand that is pulled electrophoretically through a synthetic or natural pore, only 1.5 nm wide, measuring changes in conductivity (See Fig. 1D). A common issue with nano pores is the sensitivity of detection. By utilizing conversion of single bases to longer Design Polymers by LingVitae  such problems may be circumvented. Reveo , an X PRIZE contender, is developing a Personal Genome Sequencer (PGS) based on nano-knife edges permitting non-destructive detection of bases in single DNA strands by measuring electron
tunneling characteristics for each base  (see Fig. 2D).
Fig. 2. (A) Helicos Biosciences. True single-molecule sequencing (tSMSâ„¢) is achieved by initially adding a poly A sequence to the 3â€²-end of each fragment, which allows hybridization to complementary poly T sequences in a flow cell. After hybridization, the poly T sequence is extended and a complementary sequence is generated. In addition, the template is fluorescently labeled at the 3â€²-end and thus, illumination of the surface reveals the location of each hybridized template. This process allows generation of a map of the singlemolecule landscape before the labeled template is removed. Fluorescently labeled nucleotides are added, one in each cycle, followed by imaging. A cleavage step removes the fluorophore and permits nucleotide incorporation in the next round. (B) Pacific Biosciences. A zero-modewaveguide contains a single polymerase macromolecule immobilized at the bottom (hexagon), nucleotides (circles) that are fluorescently labeled at the triphosphates (colored triangles) and a DNA strand which permits single-molecule real time (SMRTâ„¢) sequencing. Single incorporation events are possible to detect with this design since an excitation beam penetrates the lower 20-30 nm of thewaveguide, i.e. approximately a volume
of 20 zl. This volume is sufficient to detect the incorporated nucleotide while avoiding excitation of unincorporated nucleotides, thereby reducing the noise. (C) Visigen Biotechnologies. A slightly different approach for single-molecule real time sequencing is to immobilize a modified polymerase (hexagon) on a glass surface. The polymerase is engineered to carry a fluorescent donor molecule and by coding the four different nucleotides with different acceptors, base-specific FRET emission upon incorporation will reveal the sequence. (D) Nano Pores and Nano-Knife Edge Probes. Nano pores and nano-knife edge probes are two approaches for physical and direct recognition of bases. Bases in a DNA strand can be recognized either by threading through a nano pore (left), measuring a change in conductivity, or using an array of nano-knife edge probes (right) tuned to recognize each base in a stretched, immobilized DNA strand by detecting the unique electron tunneling characteristics for each base.
Application of NGS
Protein coding gene annotation using transcriptome sequence data
Ion Torrent - The Ion Torrent system (http://www.iontorrent.com) is unique among NGS technologies in that the detection for sequencing is not based upon fl uorescent dyes but rather measuring the pH change as the result of the release of a H + ion upon nucleotide incorporation using semiconductor technology (Rothberg et al., 2011 ). By sequentially adding nucleotides, the machine is able to detect which nucleotide has been incorporated into the growing strand. There are now two systems available that use this technology, the Ion PGM, for laboratory applications, and the new Ion Proton, which provides higher throughput. The new Proton system is touted to have 165 million sensors with up to a 250-bp read length upon release of the next hardware chip, projected to have 660 million sensors. For both the PGM and the new Proton systems, each hardware chip improvement increases the throughput. With the new 318 chip set, the PGM sequencer can produce over 1000 Mb of sequence with 11.1 million sensors. The other allure of the Ion systems is that sample preparation costs are relatively low compared to other systems. Publications on research that has utilized the Ion Torrent system currently focus on the shotgun sequencing of microbial genomes (e.g., Howden et al., 2011 ; Rothberg et al., 2011 ), but this system has clearly made its way into programs pursuing plant-based objectives, and we will see more publications as the Ion Torrent market continues to grow.