Recent advances in dna sequencing technologies
Recent advances in DNA sequencing technologies have led to efficient methods for determining the sequence of DNA. DNA sequencing was born in 1977 when Sanger et al proposed the chain termination method and Maxam and Gilbert proposed their own method in the same year. Sanger's method was proven to be the most favourable out of the two. Since the birth of DNA sequencing, efficient DNA sequencing technologies was being produced, as Sanger's method was laborious, time consuming and expensive; Hood et al proposed automated sequencers involving dye-labelled terminators. Due to the lack of available computational power prior to 1995, sequencing an entire bacterial genome was considered out of reach. This became a reality when Venter and Smith proposed shotgun sequencing in 1995. Pyrosequencing was introduced by Ronagi in 1996 and this method produce the sequence in real-time and is applied by 454 Life Sciences. An indirect method of sequencing DNA was proposed by Drmanac in 1987 called sequencing by hybridisation and this method lead to the DNA array used by Affymetrix. Nanopore sequencing is a single-molecule sequencing technique and involves single-stranded DNA passing through lipid bilayer via an ion channel, and the ion conductance is measured. Synthetic Nanopores are being produced in order to substitute the lipid bilayer. Illumina sequencing is one of the latest sequencing technologies to be developed involving DNA clustering on flow cells and four dye-labelled terminators performing reverse termination. DNA sequencing has not only been applied to sequence DNA but applied to the real world. DNA sequencing has been involved in the Human genome project and DNA fingerprinting.
Reliable DNA sequencing became a reality in 1977 when Frederick Sanger who perfected the chain termination method to sequence the genome of bacteriophage ?X174 . Before Sanger's proposal of the chain termination method, there was the "plus and minus" method, also presented by Sanger along with Coulson . The "plus and minus" method depended on the use of DNA polymerase in transcribing the specific sequence DNA under controlled conditions. This method was considered efficient and simple, however it was not accurate .
As well as the proposal of the chain termination sequencing by Sanger, another method of DNA sequencing was introduced by Maxam and Gilbert involving restriction enzymes, which was also reported in 1977, the same year as Sanger's method. The Maxamm and Gilbert method shall be discussed in more detail later on in this essay.
Since the proposal of these two methods, spurred many DNA sequencing methods and as the technology developed, so did DNA sequencing. In this literature review, the various DNA sequencing technologies shall be looked into as well their applications in the real world and the tools that have aided sequencing DNA e.g. PCR. This review shall begin with the discussion of the chain termination method by Sanger.
The Chain Termination Method
Sanger discovered that the inhibitory activity of 2'3'-didoxythymidine triphosphate (ddTTP) on the DNA polymerase I was dependent on its incorporation with the growing oligonucleotide chain in the place of thymidylic acid (dT) . In the structure of ddT, there is no 3'-hydroxyl group, by there is a hydrogen group in place. With the hydrogen in place of the hydroxyl group, the chain cannot be extended any further, so a termination occurs at the position where dT is positioned. Figure 1 shows the structure of dNTP and ddNTP.
Sanger discovered that the inhibitory activity of 2'3'-didoxythymidine triphosphate (ddTTP) on the DNA polymerase I was dependent on its incorporation with the growing oligonucleotide chain in the place of thymidylic acid (dT) . In the structure of ddT, there is no 3'-hydroxyl group, by there is a hydrogen group in place. With the hydrogen in place of the hydroxyl group, the chain cannot be extended any further, so a termination occurs at the position where dT is positioned. Figure 1 shows the structure of dNTP and ddNTP.
In order to remove the 3'-hydroxyl group and replace it with a proton, the triphosphate has to undergo a chemical procedure . There is a different procedure employed for each of the triphosphate groups.
Preparation of ddATP was produced from the starting material of 3'-O-tosyl-2'-deoxyadenosine which was treated with sodium methoxide in dimethylformamide to produce 2',3'-dideoxy-2',3'-didehydroadenosine, which is an unsaturated compound . The double bond between carbon 2' and 3' of the cyclic ether was then hydrogenated with a palladium-on-carbon catalyst to give 2',3'-dideoxyadenosine (ddA). The ddA (ddA) was then phosphorylated in order add the triphosphate group. Purification then took place on DEAE-Sephadex column using a gradient of triethylamine carbonate at pH 8.4. Figure 2 is schematic representation to produce ddA prior to phosphorylation.
In the preparation of ddTTP (Figure 3), thymidine was tritylated (+C(Ph3)) at the 5'-position and a methanesulphonyl (+CH3SO2) group was introduced at the 3'-OH group. The methanesulphonyl group was substituted with iodine by refluxing the compound in 1,2-dimethoxythane in the presence of NaI. After chromatography on a silica column the 5'-trityl-3'-iodothymidine was hydrogenated in 80% acetic acid to remove the trityl group. The resultant 3'-iodothymidine was hydrogenated to produce 2'3'-dideoxythymidine which subsequently was phosphorylated. Once phosphorylated, ddTTP was then purified on a DEAE-sephadex column with triethylammonium-hydrogen carbonate gradient. Figure 3 is a schematic representation to produce ddT prior phosphorylation.
When preparing ddGTP, the starting material was N-isobutyryl-5'-O-monomethoxytrityldepxyguanosine . After the tosylation of the 3'-OH group the compound was then converted to the 2'3'-didehydro derivative with sodium methoxide. Then the isobutyryl group was partly removed during this treatment of sodium methoxide and was removed completely by incubation in the presence of NH3 overnight at 45oC. During the overnight incubation period, the didehydro derivative was reduced to the dideoxy derivative and then converted to the triphosphate. The triphosphate was purified by the fractionation on a DEAE-Sephadex column using a triethylamine carbonate gradient. Figure 4 is a schematic representation to produce ddG prior phosphorylation.
Preparing the ddCTP was similar to ddGTP, but was prepared from N-anisoyl-5'-O-monomethoxytrityldeoxycytidine. However the purification process was omitted for ddCTP, as it produced a very low yield, therefore the solution was used directly in the experiment described in the paper . Figure 5 is a schematic representation to produce ddC prior phosphorylation.
With the four dideoxy samples now prepared, the sequencing procedure can now commence. The dideoxy samples are in separate tubes, along with restriction enzymes obtained from ?X174 replicative form and the four dNTPs . The restriction enzymes and the dNTPs begin strand synthesis and the ddNTP is incorporated to the growing polynucleotide and terminates further strand synthesis. This is due to the lack of the hydroxyl group at the 3' position of ddNTP which prevents the next nucleotide to attach onto the strand. The four tubes are separate by gel-electrophoresis on acrylamide gels (see Gel-Electrophoresis). Figure 6 shows the sequencing procedure.
Reading the sequence is straightforward . The first band that moved the furthest is located, this represents the smallest piece of DNA and is the strand terminated by incorporation of the dideoxynucleotide at the first position in the template. The track in which this band occurs is noted. For example (shown in Figure 6), the band that moved the furthest is in track A, so the first nucleotide in the sequence is A. To find out what the next nucleotide, the next most mobile band corresponding to DNA molecule which is one nucleotide longer than the first, and in this example, the band is on track T. Therefore the second nucleotide is T, and the overall sequence so far is AT. The processed is carried on along the autoradiograph until the individual bands start to close in and become inseparable, therefore becoming hard to read. In general it is possible to read upto 400 nucleotides from one autoradiograph with this method. Figure 7 is a schematic representation of an autoradiograph.
Ever since Sanger perfected the method of DNA sequencing, there have been advances methods of sequencing along with the achievements. Certain achievements such as the Human genome project and shall be discussed later on in this review.
Gel-Electrophoresis is defined as the movement of charged molecules in an electric field . DNA molecules, like many other biological compounds carry an electric charge. With the case of DNA, this charge is negative. Therefore when DNA is placed in an electric field, they migrate towards the positive pole (as shown in figure 8). There are three factors which affect the rate of migration, which are shape, electrical charge and size. The polyacrylamide gel comprises a complex network of pores through which the molecules must travel to reach the anode.
Maxam and Gilbert Method
The Maxam and Gilbert method was proposed before Sanger Method in the same year. While the Sanger's method involves enzymatic radiolabelled fragments from unlabelled DNA strands . The Maxam-Gilbert method involves chemical cleavage of prelabelled DNA strands in four different ways to form the four different collections of labelled fragments . Both methods use gel-electrophoresis to separate the DNA target molecules . However Sanger's Chain Termination method has been proven to be simpler and easier to use than the Maxam and Gilbert method . As a matter of fact, looking through the literature text books, Sanger's method of DNA sequencing have been explained rather than Maxam and Gilberts .
With Maxam and Gilbert's method there are two chemical cleavage reactions that take place . One of the chemical reaction take places with guanine and the adenine, which are the two purines and the other cleaves the DNA at the cytosine and thymine, the pyrimidines. For the cleavage reaction, specific reagents are used for each of the reaction. The purine specific reagent is dimethyl sulphate and the pyrimidine specific reagent is hydrazine. Each of these reactions are done in a different way, as each of the four bases have different chemical properties.
The cleavage reaction for the guanine/adenine involves using dimethyl sulphate to add a methyl group to the guanines at the N7 position and at the N3 position at the adenines . The glycosidic bond of a methylated adenines is unstable and breaks easily on heating at neutral pH, leaving the sugar free. Treatment with 0.1M alkali at 90oC then will cleave the sugar from the neighbouring phosphate groups. When the resulting end-labelled fragments are resolved on a polyacrylamide gel, the autoradiograph contains a pattern a dark and light bands. The dark bands arise from the breakage at the guanines, which methylate at a rate which is 5-fold faster than adenines. From this reaction the guanine appear stronger than the adenosine, this can lead to a misinterpretation. Therefore an Adenine-Enhanced cleavage reaction takes place. Figure 9 shows the structural changes of guanine when undergoing the structural modifications involved in Maxam-Gilbert sequencing.
With an Adenine-Enhanced cleavage, the glycosidic bond of methylated adenosine is less stable than that of methylated guanosine, thus gentle treatment with dilute acid at the methylation step releases the adenine, allowing darker bands to appear on the autoradiograph .
The chemical cleavage for the cytosine and thymine residues involves hydrazine instead of dimethyl sulphate. The hydrazine cleaves the base and leaving ribosylurea . After partial hydrazinolysis in 15-18M aqueous hydrazine at 20oC, the DNA is cleaved with 0.5M piperidine. The piperidine (a cyclic secondary amine), as the free base, displaces all the products of the hydrazine reaction from the sugars and catalyzses the b-elimination of the phosphates. The final pattern contains bands of the similar intensity from the cleavages at the cytosines and thymines. As for cleavage for the cytosine, the presence of 2M NaCl preferentially suppresses the reaction of thymine with hydrazine.
Once the cleavage reaction has taken place each original strand is broken into a labelled fragment and an unlabelled fragment . All the labelled fragments start at the 5' end of the strand and terminate at the base that precedes the site of a nucleotide along the original strand. Only the labelled fragments are recorded on the gel electrophoresis.
For many years DNA sequencing has been done by hand, which is both laborious and expensive. Before automated sequencing, about 4 x 106 bases of DNA had been sequenced after the introduction of the Sanger's method and Maxam & Gilbert methods . In both methods, four sets of reactions and a subsequent electrophoresis step in adjacent lanes of a high-resolution polyacrylamide gel. With the new automated sequencing procedures, four different fluorophores are used, one in each of the base-specific reactions. The reaction products are combined and co-electrophoresed, and the DNA fragments generated in each reaction are detected near the bottom of the gel and identified by their colour. As for choosing which DNA sequencing method to be used, Sanger's Method was chosen. This is because Sanger's method has been proven to be the most durable and efficient method of DNA sequencing and was the choice of most investigators in large scale sequencing . Figure 10 shows a typical sequence is generated using an automated sequencer.
The selection of the dyes was the central development of automated DNA sequencing . The fluorophores that were selected, had to meet several criteria. For instance the absorption and emission maxima had to be in the visible region of the spectrum  which is between 380 nm and 780 nm , each dye had to be easily distinguishable from one another . Also the dyes should not impair the hybridisation of the oligonucleotide primer, as this would decrease the reliability of synthesis in the sequencing reactions. Figure 11 shows the structures of the dyes which are used in a typical automated sequencing procedure, where X is the moiety where the dye will be bound to.
Table 1 shows which dye is covalently attached to which nucleotide in a typical automated DNA sequencing procedure
In designing the instrumentation of the florescence detection apparatus, the primary consideration was sensitivity. As the concentration of each band on the co-electrophoresis gel is around 10 M, the instrument needs to be capable of detecting dye concentration of that order. This level of detection can readily be achieved by commercial spectrofluorimeter systems. Unfortunately detection from a gel leads to a much higher background scatter which in turn leads to a decrease in sensitivity. This is solved by using a laser excitation source in order to obtain maximum sensitivity . Figure 12 is schematic diagram of the instrument with the explanation of the instrumentation employed.
When analyzing data, Hood had found some complications . Firstly the emission spectra of the different dyes overlapped, in order to overcome this, multicomponent analysis was employed to determine the different amounts of the four dyes present in the gel at any given time. Secondly, the different dye molecules impart non-identical electrophoretic mobilities to the DNA fragments. This meant that the oligonucleotides were not equal base lengths. The third major complication was in analyzing the data comes from the imperfections of the enzymatic methods, for instance there are often regions of the autoradiograph that are difficult to sequence.
These complications were overcome in five steps 
- High frequency noise is removed by using a low-pass Fourier filter.
- A time delay (1.5-4.5 s) between measurements at different wavelength is partially corrected for by linear interpolation between successive measurements.
- A multicomponent analysis is performed on each set of four data points; this computation yields the amount of each of the four dyes present in the detector as a function of time.
- The peaks present in the data are located
- The mobility shift introduced by the dyes is corrected for using empirical determined correction factors.
Since the publication of Hood's proposal of the fluorescence detection in automated DNA sequence analysis. Research has been made on focussed on developing which are better in terms of sensitivity .
Bacterial and Viral Genome Sequencing (Shotgun Sequencing)
Prior to 1995, many viral genomes have been sequenced using Sanger's chain termination technique , but no bacterial genome has been sequenced. The viral genomes that been sequenced are the 229 kb genome of cytomegalovirus , and the 192 kb genome of vaccinia , the 187 kb mitochondrial and 121 kb cholorophast genomes of Marchantia polymorpha have been sequenced .
Viral genome sequencing has been based upon the sequencing of clones usually derived from extensively mapped restriction fragments, or ? or cosmid clones . Despite advances in DNA sequencing technology, the sequencing of genomes has not progressed beyond clones on the order of the size of the ~ 250kb, which is due to the lack of computational approaches that would enable the efficient assembly of a large number of fragments into an ordered single assembly .
Upon this, Venter and Smith in 1995 proposed Shotgun Sequencing and enabled Haemophilus influenzae (H. influenzae) to become the first bacterial genome to be sequenced . H. influenzae was chosen as it has a similar base composition as a human does with 38 % of sequence made of G + C. Table 2 shows the procedure of the Shotgun Sequencing .
When constructing the library ultrasonic waves were used to randomly fragment the genomic DNA into fairly small pieces of about the size of a gene . The fragments were purified and then attached to plasmid vectors. The plasmid vectors were then inserted into an E. coli host cell to produce a library of plasmid clones. The E. coli host cell strains had no restriction enzymes which prevented any deletions, rearrangements and loss of the clones .
The fragments are randomly sequenced using automated sequencers (Dye-Labelled terminators), with the use of T7 and SP6 primers to sequence the ends of the inserts to enable the coverage of fragments by a factor of 6 .
Table 2 (Reference 17)
Random small insert and large insert library construction
Shear genomic DNA randomly to ~2 kb and 15 to 20 kb respectively
Verify random nature of library and maximize random selection of small insert and large insert clones for template production
High-throughput DNA sequencing
Sequence sufficient number of sequences fragments from both ends for 6x coverage
Assemble random sequence fragments and identity repeat regions
Gap Closure Physical gaps
Order all contigs (fingerprints, peptide links, λ, clones, PCR) and provide templates for closure
Complete the genome sequence by primer walking
Inspect the sequence visually and resolve sequence ambiguities, including frameshifts
Identify and describe all predicted coding regions (putative identifications, starts and stops, role assignments, operons, regulatory regions)
Once the sequencing reaction has been completed, the fragments need to be assembled, and this process is done by using the software TIGR Assembler (The Institute of Genomic Research) . The TIGR Assembler simultaneously clusters and assembles fragments of the genome. In order to obtain the speed necessary to assemble more than 104 fragments , an algorithm is used to build up the table of all 10-bp oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The algorithm begins with the initial contig (single fragment); to extend the contig, a candidate fragment is based on the overlap oligonucleotide content. The initial contig and candidate fragment are aligned by a modified version of the Smith-Waterman  algorithm, which allows optional gapped alignments. The contig is extended by the fragment only if strict criteria of overlap content match. The algorithm automatically lowers these criteria in regions of minimal coverage and raises them in regions with a possible repetitive element .
TIGR assembler is designed to take advantage of huge clone sizes . It also enforces a constraint that sequence from two ends of the same template point toward one another in the contig and are located within a certain range of the base pair . Therefore the TIGR assembler provides the computational power to assemble the fragments.
Once the fragments have been aligned, the TIGR Editor is used to proofread the sequence and check for any ambiguities in the data .
With this technique it does required precautionary care, for instance the small insert in the library should be constructed and end-sequenced concurrently . It is essential that the sequence fragments are of the highest quality and should be rigorously check for any contamination .
Most of the DNA sequencing required gel-electrophoresis, however in 1996 at the Royal Institute of Technology, Stockholm, Ronaghi proposed Pyrosequencing . This is an example of sequencing-by-synthesis, where DNA molecules are clonally amplified on a template, and this template then goes under sequencing . This approach relies on the detection of DNA polymerase activity by enzymatic luminometric inorganic pyrophosphate (PPi) that is released during DNA synthesis and goes under detection assay and offers the advantage of real-time detection . Ronaghi used Nyren  description of an enzymatic system consisting of DNA polymerase, ATP sulphurylase and lucifinerase to couple the release of PPi obtained when a nucleotide is incorporated by the polymerase with light emission that can be easily detected by a luminometer or photodiode .
When PPi is released, it is immediately converted to adenosine triphosphate (ATP) by ATP sulphurylase, and the level of generated ATP is sensed by luciferase-producing photons . The unused ATP and deoxynucleotide are degraded by the enzyme apyrase. The presence or absence of PPi, and therefore the incorporation or nonincorporation of each nucleotide added, is ultimately assessed on the basis of whether or not the photons are detected. There is minimal time lapse between these events, and the conditions of the reaction are such that iterative addition of the nucleotides and PPi detection are possible. The release of PPi via the nucleotide incorporation, it is detected by ELIDA (Enzymatic Luminometric Inorganic pyrophosphate Detection Assay) . It is within the ELIDA, the PPi is converted to ATP, with the help of ATP sulfurylase and the ATP reacts with the luciferin to generate the light at more than 6 x 109 photons at a wavelength of 560 nm which can be detected by a photodiode, photomultiplier tube, or charge-coupled device (CCD) camera . As mentioned before, the DNA molecules need to be amplified by polymerase chain reaction (PCR which is discussed later
Ronaghi observed that dATP interfered with the detection system . This interference is a major problem when the method is used to detect a single-base incorporation event. This problem was rectified by replacing the dATP with dATPaS (deoxyadenosine a–thiotrisulphate). It is noticed that adding a small amount of the dATP (0.1 nmol) induces an instantaneous increase in the light emission followed by a slow decrease until it reached a steady-state level (as Figure 11 shows). This makes it impossible to start a sequencing reaction by adding dATP; the reaction must instead be started by addition of DNA polymerase. The signal-to-noise ratio also became higher for dATP compared to the other nucleotides. On the other hand, addition of 8 nmol dATPaS (80-fold higher than the amount of dATP) had only a minor effect on luciferase (as Figure 14 shows). However dATPaS is less than 0.05% as effective as dATP as a substrate for luciferase .
Pyrosequencing is adapted by 454 Life Sciences for sequencing by synthesis  and is known as the Genome Sequencer (GS) FLX .
The 454 system consist of random ssDNA (single-stranded) fragments, and each random fragment is bound to the bead under conditions that allow only one fragment to a bead . Once the fragment is attached to the bead, clonal amplification occurs via emulsion. The emulsified beads are purified and placed in microfabricated picolitre wells and then goes under pyrosequencing. A lens array in the detection of the instrument focuses luminescene from each well onto the chip of a CCD camera. The CCD camera images the plate every second in order to detect progression of the pyrosequencing .
The pyrosequencing machine generates raw data in real time in form of bioluminescence generated from the reactions, and data is presented on a pyrogram 
Sequencing by Hybridisation
As discussed earlier with chain-termination, Maxamm and Gilbert and pyrosequencing, these are all direct methods of sequencing DNA, where each base position is determined individually . There are also indirect methods of sequencing DNA in which the DNA sequence is assembled based on experimental determination of oligonucleotide content of the chain. One promising method of indirect DNA sequencing is called Sequencing by Hybridisation in which sets of oligonucleotide probes are hybridised under conditions that allow the detection of complementary sequences in the target nucleic acid .
Sequencing by Hybridisation (SBH) was proposed by Drmanac et al in 1987  and is based on Doty's observation that when DNA is heated in solution, the double-strand melts to form single stranded chains, which then re-nature spontaneously when the solution is cooled . This results the possibility of one piece of DNA recognize another. And hence lead to Drmanac proposal of oligonucleotide's probes being hybridised under these conditions allowing the complementary sequence in the DNA target to be detected .
In SBH, an oligonucleotide probe (n-mer probe where n is the length of the probe) is a substring of a DNA sample. This process is similar to doing a keyword search in a page full of text . The set of positively expressed probes is known as the spectrum of DNA sample. For example, the single strand DNA 5'GGTCTCG 3' will be sequenced using 4-mer probes and 5 probes will hybridise onto the sequence successfully. The remaining probes will form hybrids with a mismatch at the end base and will be denatured during selective washing. The five probes that are of good match at the end base will result in fully matched hybrids, which will be retained and detected. Each positively expressed serves as a platform to decipher the next base as is seen in Figure 16.
For the probes that have successfully hybridised onto the sequence need to be detected. This is achieved by labelling the probes with dyes such as Cyanine3 (Cy3) and Cyanine5 (Cy5) so that the degree of hybridisation can be detected by imaging devices .
SBH methods are ideally suited to microarray technology due to their inherent potential for parallel sample processing . An important advantage of using of using a DNA array rather than a multiple probe array is that all the resulting probe-DNA hybrids in any single probe hybridisation are of identical sequence . One of main type of DNA hybridisation array formats is oligonucleotide array which is currently patented by Affymetrix . The commercial uses of this shall be discussed under application of the DNA Array (Affymetrix).
Due to the small size of the hybridisation array and the small amount of the target present, it is a challenge to acquire the signals from a DNA Array . These signals must first be amplified before they can be detected by the imaging devices. Signals can be boosted by the two means; namely target amplification and signal amplification. In target amplification such as PCR, the amount of target is increased to enhance signal strength while in signal amplification; the amount of signal per unit is increased.
Nanopore sequencing was proposed in 1996 by Branton et al, and shows that individual polynucleotide molecules can be characterised using a membrane channel . Nanopore sequencing is an example of single-molecule sequencing, in which the concept of sequencing-by-synthesis is followed, but without the prior amplification step . This is achieved by the measurement of ionic conductance of a nucleotide passing through a single ion channels in biological membranes or planar lipid bilayer. The measurement of ionic conductance is routine neurobiology and biophysics , as well as pharmacology (Ca+ and K+ channel) and biochemistry. Most channels undergo voltage-dependant or ligand dependant gating, there are several large ion channels (i.e. Staphylococcus aureus a-hemolysin) which can remain open extended periods, thereby allowing continuous ionic current to flow across a lipid bilayer . If a transmembrane voltage applied across an open channel of appropriate size should draw DNA molecules through the channel as extended linear chains whose presence would detect reduce ionic flow. It was assumed, that the reduction in the ionic flow would lead to single channel recordings to characterise the length and hence lead to other characteristics of the polynucleotide.
In the proposal by Branton, a-hemolysin was used to form a single channel across a lipid bilayer separating two buffer-filled compartment . a-Hemolysin is a monomeric, 33kD, 293 residue protein that is secreted by the human pathogen Staphylococcus aureus . The nanopore are produced when a-hemolysin subsunits are introduced into a buffered solution that separates lipid bilayer into two compartments (known as cis and trans): the head of the a-hemolysin molecule is known as the cis side, and the stem end as the trans side . The polynucleotide  inserts into the cis side of the bilayer pore that can carries an ionic current of approximately 120 pA (picoAmperes) . The lipid bilayer containing the nanopore also influences its function as an ion channel. Currently most of the nanopore sequencing is done by using the a-hemolysin . Figure 17 shows the structure of a nanopore in the a-hemolysin lipid bilayer and a double-stranded DNA becoming a single-stranded DNA (ssDNA) and passing through the nanopore.
ssDNA, are of ~1.3 nm by diameter and the a-hemolysin nanopores has a diameter of 1.5 nm . Originally Branton thought that the diameter of the 2.6 nm , this was later rectified.
During the experimentation Branton also realised that the diameter of the a-hemolysin was too narrow for the double-stranded DNA to pass through the cis and into the trans, therefore the double-stranded DNA had to be denatured .
Nanopores have proven to be used as an analytical technique in determining the concentration and size distribution of particles down to the sub-micrometer . As they measure the ionic conductance of the nucleotide passing through the pore, they act as Coulter counters, in which molecules carrying a net electrical charge are electrophoretically driven through the pore, which produces measurable changes in ionic conductivity. And due to these changes in conductivity, the nucleotides can be distinguished by its characteristics effect on the ion conductance. Figure 18 shows the ion conductance between poly A and poly C.
To date there have been two general approaches, the above mentioned a-hemolysin and there is synthetic solid-state nanopores that are being developed using various conventional fabrication techniques . However the nanopores with a-hemolysin have size, variation and stability limitation. This is because the protein is usually labile, lipid membranes are fragile, the pore diameter is fixed and the range of safe electrical operation is narrow. To overcome these difficulties, solid-state synthetic nanopores are being fabricated by various means, bearing in mind that the properties of the nanopores must be carefully selected to respond sensitivity to the molecules that are being detected. Table 3 outlines the various synthetic methods in brief.
Ion Beam Sculpting 
Latent track etching 
Electron beam-induced fine tuning 
Inorganic nanotubes 
Illumina sequencing is one of the latest sequencing technology proposed by Simon Benett in 2004 and was originally called Solexa Sequencing . In 2007, Illumina acquired the technology for $600 million . This technology is more than 100 times more efficient than the chain termination method and is a base-by-base comparision of DNA sequences . This sequencing technology is an example of sequencing by synthesis, another known example is the 454 system mentioned in pyrosequencing. Illumina sequencing is also much faster and cheaper than 454 systems .
Illumina sequencing is based on single molecules that are attached covalently to a flow cell and amplified to generate clusters of identical molecules . The flow cell consists of 8 lanes that are separately loadable. Each lane on the flow cell has a capacity of around 5 million reads, which is greater than 40 million reads generated in the space of 3 days that reads more than 1.3Gbp (gigabase-pairs). Figure 18 shows the flow cell that is used by Illumina.
On the flow cell, clusters are produced via clonal bridge amplification generating 10 million single molecule clusters per square centimetre . Bridge amplification is performed after the DNA fragments are attached to the flow by ligated adaptors. The free nucleotide are added onto the flow cell and these free nucleotides annealed to a nearby on the primer. A double stranded bridge will form after elongation followed by denaturing which produce two strands that are fixed on the surface of the flow cell. The cycle is repeated until there are clusters on the flow cell which contain approximately 1000 copies within the diameter of 1 µm .
Once the clusters are produced on the flow cell, the sequencing is carried by adding a terminator-enzyme mix (mixture consisting of four fluorescently labelled reversible chain terminators (ddNTP) and DNA polymerase) to the flow cell . This addition leads to reverse termination. Laser excitation  is applied to emit the fluorescent signal which is detected via an imaging device. Then the four fluorescently labelled reversible chain terminators are removed and the second cycle can commence. For the second cycle, the terminator-enzyme mix is added and the processed repeated until the end of the run .
Knowing that all four of the flourscently labelled reversible chain terminators are present in the reaction, this increases the sequencing accuracy as it reduces the risk of misincorporation of the nucleotides. At the moment current Illumina sequencing can read lengths of 36 bases , but this figure is expect to rise in the near future .
Illumina Sequencing has lead to the breakthrough research, an example is employing Illumina sequencers to obtain high-resolution maps of several histone modifications . Typical pattern of histone methylation were exhibited at the promoters, insulators, enhancers and the transcribed regions were identified. Accompanying these findings by Barki et al , gave insights into the function of histone methylation in the genome function.
Amplification of DNA by Polymer Chain Reaction (PCR)
Kary Mullis proposed Polymerase Chain Reaction (PCR) in 1985 and it became a useful tool for DNA sequencing . PCR is used to amplify the gene sequence and this technique is capable of producing a selective enrichment of DNA sequence by a factor of 106 . PCR amplification involves two oligonucleotides primers that flank the DNA segment which is going to be amplified and repeated cycles of denaturing  at 95oC  occurs. The primers begin to anneal to their complementary sequences, and the extension of the annealed primers with DNA polymerase (Taq Fragment) . These primers hybridize onto the opposite strands of the target sequence and is oriented so that DNA synthesis by the polymerase proceeds across the region between the primers, effectively doubling the amount of DNA. Moreover, since the extension products are also complementary to and capable of binding primers, each successive cycle essentially doubles the amount of DNA synthesized in the previous cycle. This results in the exponential accumulation of the specific target fragment by approximately 2n, where n is the number of cycles. Before the use of Taq DNA polymerase, the Klenow fragment of the E. coli was used but the major drawback was thermal stability, as it couldn't cope with 95oC temperature required to anneal the DNA segment. Figure 20 shows a schematic representation of how PCR amplifies DNA.
Applications of DNA Sequencing
Human Genome Project
DNA sequencing has lead to one of the worlds leading projects which involves sequencing the Human genome and was lead by James Watson. James Watson proposed the chemical structure of double-helical DNA in 1953 . The purpose of the human genome was not only to explain the functions of a human on a chemical level but also help us understand genetic factors in a multitude of diseases such as cancer, Alzheimer's and schizophrenia . The first draft of the human genome sequence has been published in 2001 .
With the human genome sequence complete allowed scientists to analyse each chromosome. One of the chromosomes that have been analysed is chromosome 8 . Chromosome 8 has a length of 145,556,489 bases and the majority of the genes that were found were related to the development or signalling in the nervous system. One gene identified in chromosome 8 is CSMD1, which is associated with the nervous system and widely expressed in the brain tissues .
Chromosome analysis of the human genome, can uncover the theory of evolution and lead to onto many more exciting discoveries.
One of the major applications of DNA sequencing which has been applied in the real world is DNA fingerprinting, which was proposed by Alec J. Jeffreys in 1985 . DNA fingerprinting has been used in crime scene investigation and in biological relations . The origin of DNA fingerprinting came from discovery of minisatellite DNA. The minisatellites are known as the variable region which consists of short tandem repeats (STR) of a sequence arising presumably by mitotic or meiotic unequal exchanges during replication .
In identifying the minisatelittle, a probe has to be made . The probe myoglobin 33- bp is used as it is capable of detecting other human minisatellites. Myglobin is prepared from human myoglobin minisatellite by purification of a single 33-bp repeat element followed by a head-to-head ligation and cloning of the resulting recombinant, pAV33.7 with BumHI plus EcoRI released a 767-bp DNA insert comprised almost entirely of 23 repeat of the 33-bp sequence .
The myoglobin probe hybridises to these STR and then goes under PCR. The resulting DNA fingerprints are presented on a gel-electrophoresis that is composed of multiple hypervariable DNA fragments produced on the autoradiograph shows the somatic and germline stability which are specific to an individual . The positions of bands vary on the size of the fragments.
An example of this technique used in forensic biology for crime scene investigation. DNA of high relatives molecular mass (Mr) is isolated from a 4 year old blood stain and semen stain made on a cotton cloth can digested and amplified via PCR to produce DNA fingerprints suitable for individual identification . The bands on the fingerprints which show similar size fragments to scene of the crime can identify the culprit. DNA fingerprinting has revolutionised forensic biology with regard to the identification of the culprit. Due to this revolution, thousands of court cases have been decided on the outcome of the DNA fingerprint . Figure 21 shows an example of DNA Fingerprint applied in a crime scene.
Apart from forensics, DNA fingerprinting is widely used in the diagnosis of genetic disorder such as Cystic fibrosis  and Huntington's disease , which can be detected in unborn babies as well as newborns . These diseases are detected on a fingerprint which enables early treatment of an affected child.
DNA Array (Affymetrix)
As mentioned before, Sequencing by Hybridisation has been applied to DNA Array . The DNA array is a powerful tool for high-throughput identification and quantification of nucleic acids . DNA arrays, are oftened referred to as DNA Microarray as it vastly increase the number of gene that can be studied in experiment. There are many formats of DNA array, which include microarray, macroarrays, oligonucleotide arrays and microelectronic array. The DNA array format that will be discussed will be Oligonucleotide array as it involves the application of SBH and is currently patented by Affymetrix .
Affymetrix developed the first successful technique for oligonucleotide synthesis on a chip which is known as the Affymetrix Chip GeneChip HIV 440 assay, which is eventually shorten to Affymetrix assay . This array format involves the use of photolithography which is a technique for manufacturing high density oligonucleotide probe array which involve the parallel synthesis of a large number of DNA sequence. These probes are capable of acquiring mass genetic information from biological samples e.g. identification of genetic diseases .
Figure 22 shows light direct synthesis of a oligonucleotide (photolithography) . A surface bearing photoprotected hydroxyl (X-O) is illuminated through a photolithographic mask (M1), generating a free hydroxyl groups (O-H) in the photo deprotected regions. The hydroxyl group are then coupled to a deoxynucleoside phosphoramidite (5'-photoprotected). A new mask pattern is applied, and a second photoprotected phosphoramidite is coupled. The process is repeated until the desired set of products is obtained.
The oligonucleotide probes are synthesised on a glass support (in situ)  (shown in Figure 24), which is prepared by cleaning the glass support in concentrated NaOH, followed by thorough cleaning in water. The surfaces were then coated in 10% (v/v) bis(2-hydroxyethyl)aminopropyltriethoxysilane which is a silane coupling agent with a hydroxyl functional group that serves as the synthesis site . With silane coupling agent in place, the synthesis linker is placed by reacting derivatised substrates with 4,4'-dimethyloxytrityl(DMT)-hexaethyloxy-O-cyanoethyl phosphate. The photolabile protecting groups (X) is a-methyl-2-nitropiperonyl oxycarbonyl (MeNPoc). The MeNPoc is activated when regions of the surface are exposed to illumination (hv) resulting in the addition of the nucleoside phosphoramidite monomers. The phosphoramidite group react with the hydroxyl group on the substrate in the presence of the silane coupling agent. Under hv radiation, the photolabile MeNPOC group produces the hydroxyl group. The next MeNPOC protected nucleotide is added and coupled to the free hydroxyl group of the grafted molecule. The MeNPOC group protecting the 5' end of the added nucleotide is by hv and this procedure is repeated as many times to achieve the required length of the nucleotide chain (usually 25-mer or less).
There are more projects in the future which involve DNA sequencing, there is the 1000 genome project that aim to sequence the genomes of 1000 volunteers from around the globe (a follow up to the human genome project) . The expected cost of this project is around $30 to $50 million, with aim uncover more genetic factors of human health and disease. Worldwide institutions will be participating in the project with names such as Wellcome Trust Sanger and US National Human Genome Reseach Institute (NGHRI). By having the sequence of genome from a 1000 volunteers, genetic studies can be taken on common diseases which will lead to the findings of any casual variants found among these common disease.
Another project occurring is the Cancer genome project . This is a 10 year project, in which tumour samples are gathered from thousands of patients and then analysed to find the mutated regions. The mutated regions will be resequenced in order to identify the specific mutations.
DNA will have it's further use in medicine, most recently DNA sequencing has uncovered a non-invasive diagnostic method to find out whether an the fetus has genetic disease through the analysis of placental expressed mRNA in maternal plasma . The invasive test methods involve chronic villus sampling (CVS) or amniocentesis.
The development of DNA sequencing will continue and there will be affordable and much more efficient sequencing technologies available in the near future. There is the ultimate goal of $1000 human genome, where sequencing the human genome would cost $1000 . At the moment there isn't a sequencing technology that will cost that amount. When the human genome was sequenced, it is estimated to have cost around $3 billion dollars . The Illumina sequencing methods claims to be able to sequence the human genome for about $100,000 . So the hunt for the efficient method of sequencing is still continuing.
One of future sequencing technologies is nanopore sequencing, one of the single- molecule sequencing technologies. There are many more single-molecule sequencing technologies such as the HeliScope from the Helicos Biosciences and the Single-molecule Real Time Sequencing-by-Synthesis (SMRT) . These technologies are currently being develop or at the proof-of-concept stage .
Since Sanger's chain termination method of sequencing DNA, spurred many different technologies to achieve the same goal with the introduction of the automated sequencers. In time the computational power improved, leading onto whole genome shotgun sequencing along with the first bacterial genome to be sequenced. Pyrosequencing was introduced, and this detects the release of PPi and has been applied by 454 Life Sciences as sequencing-by-synthesis technology known as the GS FLX. Another method of sequencing-by-synthesis is Illumina sequencing which is faster and cheaper than the 454 system. An indirect method is sequencing by hybridisation which lead to the DNA array currently patented by Affymetrix. The future of DNA sequencing is Nanopore sequencing which is an example of single-molecule sequencing and requires no amplification. DNA has been applied to the real world with applications like DNA fingerprinting and of course the human genome project. DNA sequencing also has its many uses in the forthcoming future.
- Brown T.A, (1998),Genetics, 3rd Edition, Chapman & Hall
- Sanger et al, (1977), DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. USA, Vol 74, pp 5463-5467,
- Pierce B. A, (2008), Genetics: A Conceptual Approach, , 3rd Edition, , W.H. Freeman and Co.
- McCartney et al, (1966), Purine Nucleoside. XIV. Unsaturated Furanosyl Adenosine, Nucleoside prepared via Base-Catalysed Elimination Reactions of 2'-Deoxyadenosine Derivative, J. Am. Chem. Soc, Vol 88, pp 1549-1553,
- Geider et al, (1972), DNA synthesis in Nucleoside – Permeable Escherichia Coli Cells. The Effects of Nucleotides Analogues on DNA Synthesis, Eur. J. Biochem, Vol 27, pp 554 – 563,
- Cooper N. G, (1994), The Human Genome Project: Deciphering the blueprint of Heredity, University Science Books.
- Maxam & Gilbert, (1977), A New method for sequencing for DNA, Proc. Natl. Acad. USA, Vol 74, pp 560 – 564,
- Southern et al, (1975), Detection of Specific sequences among DNA fragments separated by gel electrophoresis, J. Mol. Biol, Vol 98, pp 503-517,
- Lehinger et al, (2004), Principles of Biochemistry, 4th Edition, W.H. Freeman and Co.
- McMurray, (2003), Fundamentals of Organic Chemistry, 5th Edition, Brooks/Cole
- Hood et al, (1986), Fluorescence detection in automated DNA sequence analysis, Nature, Vol 321, pp 674 – 679
- Lee et al, (1992), DNA sequencing with dye-labelled terminators and T7 DNA polymerase: effect of dyes and dNTPs on incorporation of dye-terminators and probability analysis of termination fragments, Nucleic Acids Research, Vol 20, pp 2471-2483
- Prescott et al, (2005), Microbiology, 6th Edition, McGraw-Hill
- Bankier, (1991), The DNA sequence of the human cytomegalovirus genome, DNA seq, Vol 2, pp 1 – 12
- Goebel S et al, (1990), The Complete DNA sequence of vaccinia virus, Virology, Vol 179, iss 1, pp247-266
- Oda k et al, (1992), Gene Organisation deduced from the complete sequence of liverwort marchantia-polymorpha mitochondrial DNA – A primitive form of plant mitochondrial genome, J. Mol. Bio, Vol 223, iss 1, Pg 1-7,
- Venter and Smith et al, (1995), Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd, Science, Vol 269, pp 496 – 512,
- Waterman et al, (1988), Computer-Analysis of nucleic acid sequences, Methods of Enzymol, Vol 164, pp 765-793
- Ronaghi et al, (1996), Real-Time DNA sequencing using detection of pyrophosphate release, Analytical Biochemistry, Vol 242, pp 84-89
- Ronaghi and Elahi, (2004), Pyrosequencing: A tool for DNA sequencing analysis, Methods in Molecular Biology, Vol 255, pp 211-219
- Nyren et al, (1993), Solid-phase DNA Minisequencing by an enzymatic luminnometric inorganic pyrophosphate detection assay, Anal. Biochem, Vol 208, pp 171-175
- Egholm et al, (2005), Genome Sequencing in microfabricated high-density picolitre reactors, Nature, Vol 437, pp 376-380
- Quinn et al, (2008), Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Alantic salmon genome, BMC Genomics, Vol 9,
- Petterson et al, (2009), Generations of sequencing technologies, Genomics, Vol 93
- Bentley et al, (2006), Whole Genome re-sequencing, Current Opinion in Genetics and Development, Vol 16, pp 545-552
- Drmanac et al, (2002), Sequencing by Hybridisation (SBH): Advantages, Achievements and Opportunities, Advances in Biochemical Engineering/Biotechnology, vol 77, pp 75 – 104
- Drmanac et al, (1989 (reprint from 1987)), Sequencing of megabase plus DNA by Hybridization - Theory of the Method, Genomics, Vol 4, pp 114 – 128,
- Doty et al, (1960), Strand separation and specific recombination in deoxyribonucleic acids: Physical chemical studies, Proc. Natl. Acad. USA, Vol 46, 461-476
- Xueji Zhang et al, (2007), Electrochemical Sensors, Biosensors and Their Biomedical Applications, 1st Edition, Academic Press
- Pease et al, (1994), Light generated oligonucleotide arrays for rapid sequence analysis, Proc. Natl. Acad. USA, Vol 91, pp 5022-5026
- Branton and Deamer et al, (1996), Characterisation of individual polynucleotide molecules using a membrane channels, Proc. Natl. Acad. USA, Vol 93, 13770-13773
- Rang et al, (2003), Pharmacology, 5th Edition, Eselvier
- Rhee et al, (2007), Nanopore Sequencing Technology: nanopore preparations, Trends in Biotechnology, Vol 25, 174-181
- Ashkenasy et al, (2005), Recognising a Single Base in an Individual DNA Strand: A Step Toward DNA Sequencing in Nanopores, Angwandte Chemie Int. Ed, Vol 44, pp 1401-1404
- Li et al, (2001), Ion beam sculpting at nanometer length scales, Nature, Vol 412, pp 166-169
- Saleh and Sohn, (2003), An artifical nanopore for molecular sensing, Nano Lett. Vol 3, pp 37-38
- Siwy et al, (2002), Fabrication of a synthetic nanopore ion pump, Phys. Rev. Lett, Vol 89
- Storm et al, (2003), Fabrication of solid-state nanopores with single-nanometre precision, Nat. Mater, Vol 2, pp 537 – 541
- Ito et al, (2004) A carbon nanotube-based coulter nanoparticle counter, Acc. Chem. Res, Vol 37, pp 937-945
- Bennet S, (2004), Solexa Ltd, Pharmacogenetics, Vol 5, 433-438
- (2006), Illumina, Inc. to Purchase Solexa, Inc., Corporate Growth Report, Vol 206, pp 3
- Zhang et al, (2008), Using quality scores and longer reads improves accuracy of Solexa read mapping, BMC Bioinformatics, Vol 9, Art 128
- Dohm et al, (2008), Substantial biases in ultra-short read data sets from high-throughput DNA sequencing, Nucleic Acids Research, Vol 36, art 105
- Barski et al, (2007), High-Resolution Profiling of Histone Methylation in the Human Genome, Cell, Vol 129, 823-837
- Watson, (1992), The Human Genome Project: Past, Present and Future, Science, Vol 248, pp 44-48
- Watson & Crick, (1953), Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid, Nature, Vol 171, pp 737-738
- Venter et al, (2001), The Sequence of the Human Genome, Science, vol 291, pp 1304-1351
- Lander et al, (2001), Initial sequencing and analysis of the human genome, Nature, Vol 409, pp 860-921
- Nusbaum et al, (2006), DNA Sequence and analysis of human chromosome 8, Nature, Vol 439, pp331-335
- Mullis et al, (1988) Primer-Directed Enzymatic Amplification of DNA with a Thermostable DNA Polymerase, Science, Vol 239, pp 487 – 491
- Mullis et al, (1985), Enzymatic Amplification of –Globin Genomic Sequences and Restriction Site Analysis for Diagnosis of Sickle Cell Anemia, Science, Vol 230, pp 1350-1354
- Jeffreys, (1985), Hypervariable minisatelite region in Human DNA, Nature, Vol 214, pp 67-75, 1985
- Jeffreys, (1985), Forensic Application of DNA Fingerprints, Nature, Vol 318, iss 6064, pp 577 – 579
- Grothues et al, (1988), Genome fingerprinting of pseudomonas-aeruginos indicates colonization of cystic-fibrosis siblings with closely related strains, J. Clin. Microbio, Vol 26, pp 1973-19977
- Pritchard et al, (1992), Recombination of 4P16 DNA markers in an unusual family with Huntington disease, Am. J. Human. Gen, Vol 50, pp 1218-1230
- Vahey et al, (1999), Performance of the Affymetrix GeneChip HIV PRT 440 Platform Antiretroviral Drug Resistance Genotyping Human Immunodeficiency Virus Type 1 Clades and Viral Isolates with Length Polymorphism, J. Clin. Microbio, Vol 37, pp 2533-2537
- Lausted et al, (2004), POSaM: a fast, flexible, open-source, inkjet oligonucleotide synthesizer and microarray, Genome Biology, Vol 4,
- Siva, N, (2008), 1000 Genome Project, Nature Biotechnology, Vol 26, pp 256
- Kaiser, J, (2005), NCI Gears Up for Cancer Genome Project, Science, Vol 307, pp 1182
- Biever C, (2008) Promising Signs for Down's Blood test, New Scientist, No 2677, pp 10
- Lo, D et al, (2007), Plasma placental RNA allelic permits noninvasive prenatal chromosomal aneuploidy detection, Nature Medicine, Vol 13, 218-223