Next Generation Dna Sequencing Techniques Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Escherichia coli are one of several types of pathogenic bacteria that are a major cause for several diseases. Infection with these type bacteria may cause death because it can secrete toxin that has a strong impact on human body (Doyle, 1984). They have many strains. For example, E. coli O157 is a dangerous strain that causes several diseases. To identify and understand this strain, genomics is an appropriate discipline related to the study of genetic material of organisms. Furthermore, DNA sequencing has played predominant role to further understanding of its genetic material (Karch, 2005). However, the area of DNA sequencing technology advancement has a different history (Sanger, 1988). In the past, the vast majority of DNA sequence production to data depended on numerous versions of the Sanger biochemistry (Sanger, 1977). Recently, the motivation for developing completely new strategies for DNA sequencing has appeared (Shendure, 2004). This motivation is called next-generation DNA sequencing including several platforms. They have played a pivotal role in further understanding sequencing and microorganisms' genetic material. Second-generation DNA sequencing includes many sophisticated services such as 454 Illumina, Polonator, 454 pyrosequencing, SOLiD, Roche and PCR platforms.

This project will discuss Escherichia coli O157:H7, will explain the classic DNA sequencing and the second-generation DNA sequencing for this strain, including: how next generation DNA sequencing platforms are used in Escherichia coli O157:H7. It will reveal the advantages of modern approaches in comparison with classic methods.

Escherichia coli O157:H7

Escherichia coli O157:H7 are a leading food-borne transferable pathogenic bacterium that leads to hemorrhagic colitis, diarrhoea and haemolytic uremic syndrome. It came from an outbreak in Sakai city, Japan, in 1996. It has two types of plasmids: a 3.3 kb and a 93kb termed pOSAK1and pO157 respectively. It seems that report the complete chromosome sequence of an O157 strain isolated from the Sakai outbreak. The chromosome is 5.5 Mb in size, 859 Kb (Cohen et al, 1991). It is clear that identified a 4.1-Mb sequence extremely conserved between the types of bacteria, which might be the fundamental backbone of the E. coli chromosome. The residual 1.4-Mb sequence composes of O157:H7-specific sequences, many of which are changed foreign DNAs (Blattner et al, 1997). The fundamental roles of bacteriophages in the emergence of O157 is apparent by the existence of 24 prophages and prophage-like elements that invade more than half of the O157 particular sequences. The O157 DNA encodes 1632 proteins and 20 tRNAs. Amongst these, Approximately 131 proteins are supposed to have virulence-related functions. Genome-codon usage analysis suggested that the O157 particular tRNAs are included in the well-organized expression of the strain-exact genes. A total set of the genes particular to O157 presented here sheds new insight into the pathogenicity and the physiology of O157 and will open a gate to completely understand the molecular mechanisms underlying the O157:H7 infection.


Classically, diagnosis has been performed by culturing on MacConkey agar and then utilising typing antiserum. However, a considerable typing antiserum has demonstrated cross reactions with non-E. coli O157 colonies. Moreover, not all E. coli O157 strains associated with haemolytic-uraemic syndrome (HUS) are nonsorbitol fermentors. howevere, PCR, fluorescent and antibody are modern way to diagnosis of this type of E. coli (Anagnou et al, 1991).

Material and Methods

DNA extraction

A single Escherichia coli O157 colony from a fresh culture on Columbia blood agar was inoculated into a liquid culture of nutrient broth agar and incubated overnight at 37°C. The liquid culture was utilised to prepare DNA as explained, apart from that phenol extraction was omitted and the corresponding supernatants were immediately precipitated with isopropanol (Wilson, 2001).

Sequencing and assembly of E. coli O157

The commenced step of sequencing was completed by the whole genome haphazard shotgun technique. Researchers created a pUC17-based library including 1- to 2Kb inserts, and sequenced 55, 155 clones utilising a forward sequencing primer. Then collecting the sequence data using phred (it is a software programme that can assemble DNA fragments), and phrap (software programme that can organise DNA sequences) (Makino et al, 1998). They chose two groups of clones: clones containing inserts whose sequences began within 1.5 Kb from the ends of contiguous sequences and those holding inserts whose opposite ends covered the areas which have uncertainty in the sequence (poorer than the estimate value 20 by phred grade) (Ewing et al, 1998). A total of 19, 965 clones that were chose in terms of these criteria were sequenced utilizing the reverse primer. This plan was fairly successful in dropping the number of haphazard clones to be sequenced and improving the sequence value. Furthermore, researchers made a lambda-based library with 20-Kb inserts. They chosen 80 clones that involved the fragments non-homologous to the K-12 fragment at either end of the inserts and resolute the whole fragments of each insert by the haphazard plan. The received fragments were collected into 111 contigs larger than 1 Kb in size. At this step, they make sure that the fragment waves of all the areas that contained little quality standards by visual check, and the entire regions with any uncertainty (286 regions) were increased by PCR and reanalyzed by straight sequencing of the Polymerase Chain Reaction products. Consequently, they carried out the gap closing by Polymerase Chain Reaction according to the physical map of the genetic material and the outcome of the systematic chromosomal mapping (Ohnishi et al, 1999). The physical map concluded the entire chromosomal fragment resolute in this study agreed with the experimentally physical map, assurance the precision of the last assembly.

Conventional DNA sequencing (Sanger sequencing)

Since the early 1990s, DNA sequence creation has completely been carried out with capillary-based, semi-automated implementations of the Sanger biochemistry (Sanger, 1977). E.coli O157 has 12k genome, 5416 genes in 5.44 x 10 6 base pair of DNA. In high-throughput production pipelines, E. coli DNA to be sequenced that is prepared by one of two methods. In the first method, for shotgun de novo sequencing, haphazardly segmented DNA is cloned into a high-copy-number plasmid, which is then utilised to transform bacterial plasmid (Avery, 2002). Second method, for targeted resequencing Escherichia coli O157 DNA in PCR amplification is conducted with primers that flank the target. The production of both methods is an amplified template, either as several copies of a single plasmid insert present within a spatially separated bacterial colony that can be selected. The sequencing biochemistry takes place in a cycle DNA sequencing reaction, in which cycles of template denaturation , primer annealing (45-60cent grade) and primer elongation(72 cent grade) are done. The primers of E. Coli O157 are composed of two primers which complements to recognised sequence immediately flanking the area of interest. Each round of primer elongation is stochastically terminated by the incorporation of fluorescently- labelled dideoxynucleotides (ddNTPs) (Avery, 2002). In the resulting mixture of end-labelled elongation products, the label on the terminating ddNTPs of any provided piece corresponds to the nucleotide identity of its terminal position (Lawrence, 1997). DNA sequence of Escherichia coli O157 is determined by electrophoretic separation of the single-stranded, end-labelled extension products in a capillary-based polymer gel electrophoresis. Laser excitation of fluorescent labels as fragments of discreet lengths exits the capillary, coupled to four-colour a detection of emission spectra, gives the read out that is represented in a classic-sequencing (traces) (Brzuszkiewicz , 2011). Computer programme translates these traces into DNA sequence, while also generating error probabilities for each base-call (Ewing, 1998). The approach that is taken for subsequent analysis, for example, genome gathering or variant identification depends on precisely what is being sequenced and why Simultaneous electrophoresis in more than 9o or 380 independent capillaries provides a limited level of parallelisation.

The using of Next-generation DNA sequencing approaches for E. coli O157:

PCR amplifications:

The E. coli O157 were analysed by PCR. This is often composed of three stages. First stage is melting separating DNA fragment into two strands and temperature is approximately 92°C. Second stage is annealing 45°C -60 °C adding primers to two primers such as first primer (5'-TCCGGCTCGTATTGTGTGGA-3') and second primer (5'GTGCTGCAAGGCGATTATGG-3'). In order to produce complementary DNA sequences, Third stage will be commenced. Third stage is elongation 70 °C extended primers. The two primers are elongated by enzymatic processes. this step will cycle several times under temperature conditions . Finally, this process can repeat several times to generated further sequence (see fig.1). It requires (25°C- 40 °C) (Szalanski et al, 2003).

Full-size image (34 K)

Figure 1: PCR amplification of DNA from E. coli O157 using various annealing temperatures, Mg Cl2 concentration and various numbers of cycles. Lane M, marker 100 bp DNA ladder; lane 1-17 are E. coli O157

454 pyrosequencing approach for E. coli O157

The 454 pyrosequencing technology is relied on sequencing-by-synthesis and consists in the cyclic flowing of nucleotide reagents (repeatedly flowing T, A, C, G) over a PicoTiterPlat. The plate consists of about one million wells, and each well has nearly one bead carrying a copy of a single-stranded DNA fragment to be sequenced (see fig.2). When the flowed nucleotide is complementary to the template fragment in a well, the existing DNA strand in this well is elongated with additional nucleotide(s) by a polymerase. This hybridization causes a reaction that makes an observable light signal which is recorded by a camera. The light intensity is converted into a flow value, a two-decimal non-negative number that is proportional to the length of a homopolymer run, for example (see fig.3). It creates the number of nucleotides included in the flow, assessed by simply rounding the number to the closest integer (Margulies, 2005). As shown this picture.

The use of Illumina Genome Analyser approach for E.coli O157

Fragment libraries of E. coli O 157 were generated with the Illumina paired end DNA sample preparation, according to manufacturer's information, and quantified on an Agilent DNA 1000 fragments. To make up doubled-strand adapters, olignucleotides (`5ACGTGACTAACAGTATTAG-3`) and (`5ATACGCATAACCGAT-3`) were annealed. The input and output DNA was ligated to the double-strand adapters and then quantified by quantitative PCR (qPCR) using the primers Ad_T_qPCR1 (5′CTTTCCAGTCCTCAGCTC-3′) and Ad_B_qPCR2 (5′ pO157 with Novo Align. Totals of 9.9 million input reads (78.4%) and 10.7 million output reads (80.6%) were mapped to unique positions in E. coli O157 genome. Subsequent analyses were performed with R, version 2.8.0 (5`ATTCCTGCGGACAATCGTCATAACTTC-3′) and SYBR green. Two hundred nanograms of adaptor-ligated fragments were used to specifically amplify transposon insertion sites (see fig.2). Twenty-four cycles of PCR were performed with transposon-specific forward primer MiniTn5-P5-3pr-3 (5′-AACAGTGACGCCCGGGTATGTG-3′),which contains the Illumina P5 end for attachment to the flow cell, and reverse primer RInV3.3 (5′CAAGACACTACCGACCGATCT-3′), containing the Illumina P7 end). PCR products were size isolated in an agars gel, and fragments of 355 to 455 bp were excised and recovered with QiaExII gel extraction columns following the maker's information, but without heating (Quail, 2009). DNA was eluted in 30 μl of elution buffer, and quantified by qPCR with standards of identified concentration, using primers Syb (5′ATGATACGGACCGACTATCCGAG-3′) and Syb_RP7 (5′-CAAGCAGAAGTCGAG-3′)(Quail, 2009). The DNA fragment libraries were sequenced for 36 cycles according to the manufacturer's instructions on particular end flow cells by an Illumina sequencer, utilising the custom sequencing primer MiniTn5-3pr-seq3 (5′-AGGCTGCGCAGTCACTTGTGTA-3′), which binds 10 bp from the transposon end (Elaine, 2008). There were 12.2 and 13.1 million reads received for the input and output pools, respectively. Totals of 12.1 million (96.3%) of the input reads and 12.4 million (93.7%) of the output reads contained perfect matches to the 3′ end of mini-Tn5Km2 (De Lorenzo,1990), and these reads were involved in downstream analyses. However, the remainder of each sequence read was mapped to the EDL933 genetic material and pO157 with Novo Align. The total of 9.6 million input reads (78.2%) and 10.3 million output reads (80.3%) were mapped to unique positions in the E. coli O157 genome. Subsequent analyses were performed with version 2.8.0.

Figure 2. Illumina Genome Analyser. Beginning at similar fragmentation and adapter ligation steps, after that an isothermal step that increases each sequence into a cluster (Elaine, 2008). The cluster fragments are denatured, annealed with a sequencing primer and subjected to sequencing by production using more than 25 blocked labelled nucleotides.

Methods using for cyclic array of E. coli O157:

The 454 method, clonally amplified 28-mum beads made by emulsion PCR are offered as sequencing characteristics and are haphazardly deposited to a microfabricated array of picoliter-scale wells. by pyrosequencing, each cycle composes of the introduction of a single nucleotide kind, followed by addition of substrate (luciferin, adenosine 5'-phosphosulphate) to run light production where polymerase-driven incorporation of that nucleotide has occurred. This is followed by an apyrase wash to eliminate unincorporated nucleotide (see fig.3) (Margulies, 2005).

Figure 3. 454 method, library building ligates 454-particular transformers to DNA sequences and pairs amplification pellets with DNA fragments in an emulsion polymerase chain reaction to increase sequences before sequencing. The beads are loaded into the picotiter-scale (Elaine, 2008). The bottom panel demonstrates the pyrosequencing reaction that takes place on nucleotide incorporation to report sequencing by creation.

The Solexa technology, an array of clonally maximised sequencing traits is made straight on a surface by bridge PCR. Each DNA sequencing cycle involves the addition of a mixture of four modified deoxynucleotide types, each bearing one of four fluorescent labels and terminates moiety at the 3' hydroxyl location. A DNA polymerase runs synchronous elongation of primed sequencing characteristics. This is followed by imaging in four methods after cleavage of both the fluorescent labels and the terminating moiety.

The SOLiD and the Polonator methods, clonally amplified 1-mum fragments are utilised to generate a disordered, dense array of sequencing characteristics. Sequencing is performed with a ligase, instead of a polymerase (Nagalakshmi et al, 2008). With SOLiD, each sequencing cycle of E. coli O157:H7 tells a partially degenerate population of fluorescently stained octamers. The population is prepared such that the label correlates with the identity of the middle 2 bp in the octamer (the association with 2 bp, instead of 1 bp, is the basis of two-base encoding) (McKernan, 2006). Then ligation and imaging in four channels, the labelled portion of the octamer that is cut via a modified connection between bases 5 and 6, separation a free end for another cycle of ligation. Some of these cycles will interrogate a spaced and discontiguous set of bases. Then, the system is rearranged by denaturation of the extended primer and the process is frequent with a diverse balance. For example a primer set back from the original location by several bases such that a different set of bases is interrogated on the round of serial ligations (see fig. 4).

Figure4. (a) SOLiD sequencing by ligation first anneals a general sequencing primer then experiences subsequent ligation of the proper labelled 7 mer, followed by finding at each cycle (Elaine, 2008). (b) Two bases encoding of the SOLiD statistics significantly eases the discrimination of base calling mistakes from right polymorphisms.

The pros and cons of the traditional sequencing methods and the next-generation approaches

Regarding limitations and practical aspects of implementation and distinct differences between the traditional sequencing and the next-generation approaches resolve which general method is the best alternative for any given project. Sanger sequencing want extreme times and labours and also are not appropriate and convenient for E.coli DNA sequencing. However, the applications of traditional sequencing (Sanger sequencing) have developed diversely, and for small-scale projects in the kilo base-to-mega base range, this will likely stay the technology of alternative for the immediate future. This is a result of its greater granularity, that is, the ability to appropriately run at either small or large creation scales relative to the new technologies, even though, it is clear that despite limitations relative to conventional sequencing such as in terms of read-length and accuracy. Large-scale projects will quickly rely entirely on next-generation sequencing approaches. As an example of the advantages of the modern platforms, consider that large-scale resequencing studies for identifying germ line variation or lung cancer mutations have depended on Sanger-based resequencing ways that in turn are dependent on one-at-a-time PCR amplification of each targeted area (Wood, 2007). In this context, the requirements of a conventional-sequencing way involve high costs and inaccurate measurements beyond reagents. These comprise robotic support of reagents, mechanism of multiple samples in 90- or 380-well formats, continuation of capillary-based sequencers, bioinformatics infrastructure to handle the flow of statistics and devoted support staff to sustain complex apparatus. In a current informal survey researchers carried out of the overall cost to traditionally sequence 100 genes from 100 samples, assuming each single gene has about 10 exons, quoted assessments from non-commercial sequence centres and commercial sequence service suppliers ranged from thousands to million dollar. Obviously, this cost is beyond the range of individual laboratories. Moreover, minimising the per-base cost of sequencing by some orders of magnitude, second-generation apparatuses have fewer infrastructure requirements; instead, the principle problem is downstream data management.

There are significant differences amongst the next-generation platforms themselves (Table 1) that may cause advantages with respect to particular applications. Some applications, for example, resequencing may be more bearing of short read-lengths than de novo assembly. For performance depend on tag counting, one would truly prefer a given amount of sequencing to be ripped into as several reads as possible. The overall accuracy and the particular error distributions of technologies might also be quite relevant. Mate-paired reads, helpful in de novo assembly and for mapping variants, for example, are currently available with all of the next-generation platforms, but the read pairs which are separated can be managed in an important factor (Nagalakshmi et al, 2008). Finally, the cost of sequencing differs significantly between the second-generation platforms and consumers, researchers hope for further competition between vendors than was the case with traditional sequencing in the past five years. Comparisons of per-base costs can be useful but occasionally ambiguous, for example, more accurate bases may be more important than less accurate bases and also saving time might be more crucial than spending times .

Table 1. Second-generation DNA sequencing technologies


Feature generation

Sequencing by synthesis

Cost per mega base

Cost per instrument



Emulsion PCR

Polymerase (pyrosequencing)



250 bp


Bridge PCR

Polymerase (reversible terminators)



36 bp


Emulsion PCR

Ligase (octamers with two-base encoding)



35 bp


Emulsion PCR

Ligase (nonamers)



13 bp


Single molecule

Polymerase (asynchronous extensions)



35 bp


Escherichia coli O157: H7 is a harmful bacterium and results in several diseases. In recent years, in order to understand and identify this strain, researchers have commenced analysis of its genetic material. After extraction and Isolation of DAN sequences, conventional and second-generation DNA sequencing technologies are performed to understand the DNA. These approaches are significantly practical for E. coli O157 DNA sequencing. However, ordinary features extend beyond the technologies themselves, to the quantity and quality of data that are produced. The decrease in the costs of DNA sequencing of this strain by several preparations of magnitude is democratising the elongation to which individual researchers can follow projects at a scale previously accessible only to key genome centres. The notable increase in interest in this field is also proven in the number of groups that are now working on DNA sequencing approaches to supplant even the modern technologies argued here. it is difficult to peer even a few years into the future, but we anticipate that next-generation sequencing technologies will become as in common, commoditised and routine as Illumina, SOLiD, Polantor and HeliScop technology ,which are used to Escherichia coli O157: H7, have become over the several years. we also anticipate that the difficulties will speedily alter from mastering of the technologies themselves to the question of how best to go about extracting biologically useful or clinically helpful insights from a extremely large amount of statistics.