Parallel Target Capture And Dna Sequencing Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this essay I will discuss the relatively new technique involving massively parallel sequencing and how it is used to identify RNA editing sites. I will also discuss how RNA editing sites relate to nervous system function and RNA editing related human diseases. Finally I will also examine the future of this technique and how it works.

"DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high throughput data collection". (Shendure and Ji,2008) In 1977 Maxam and Gilbert and Sanger et al. first described DNA sequencing. Sanger sequencing was the first method used and is still used today for clinical DNA sequencing. (Ansorge,2009) This involves preparing DNA by PCR amplification of a region of interest. The DNA is then sequenced by cycle sequencing which involves several rounds of template denaturation, primer annealing and extension. The fragments are sequenced with a combination of normal deoxynucleotides and terminating labeled dideoxynucleotides, all with a base specific color as seen in figure 1.

Figure 1.Sanger sequencing Workflow (Tucker et al,2009)

Massively parallel sequencing techniques are continuously being developed and are beginning to be used at the moment. Massively parallel sequencing platforms such as The Illumina Genome analyzer, the SOLiD system and the Roche GS-FLX 454 Genome Sequencer allow larger-scale production of genomic sequences and the number of human genomes sequenced with such instrumentation is increasing rapidly. Over the last few years these have become widely available reducing the costs by over three orders of magnitude. At present the costs are still expensive even with recent developments but over the next few years these will continue to fall. This will make it easier for smaller labs to adopt this approach. At present parallel sequencing has little impact on clinical diagnostics but it seems to be only a matter of time before it does with the promise of $1000 sequencing of genomes close at hand.

"Massively parallel sequencing will allow simultaneous screening for mutations in hundreds of loci in genetically heterogeneous disorders, whole genome screening for novel mutations, and sequence-based detection of novel pathogens that cause human disease. In addition, it will also permit clinical applications of our expanding knowledge of pharmacogenetics, cancer genetics, epigenetic and complex traits."

The platforms "essentially massively parallelize individual reactions, sequencing hundreds of thousands to hundreds of millions of distinct, relatively short DNA sequences in a single run."(Lister et al, 2009).

Each of these new platforms uses different methods in sequencing. This allows them to achieve different read lengths and run times per single-end run. The helicos machine is slightly different; it sequences single molecules of DNA without a prior amplification step. It gives reads of 55bp in length and approximately 200 Mb DNA per day. (ten Bosch and Grady, 2008). The Illumina Genome Analyzer uses a "sequencing by synthesis" method. (Tucker et al, 2009). It gives quite short reads, approximately 36bp in length. This is because it can sequence 600Mb DNA per day. The SOLiD System is extremely accurate at 99.4%. It sequences by multiple cycles of hybridisation and ligation which allows it to generate approximately 500Mb of sequence per day and read lengths up to 35bp in length. Table 1 summarizes these techniques. (ten Bosch and Grady,2008).

RNA Editing and Nervous System Function

"RNA editing has been defined as a co- or post-transcriptional RNA processing reaction other than capping, splicing or 3'-end formation that changes the nucleotide sequence of the RNA substrate"(Schaub and Keller, 2002).

RNA editing reactions happen in many organisms by different molecular methods such as base modification or either by insertion or deletion of nucleotides.(Schaub and Keller,2002) The major type of RNA editing in the nucleus of higher eukaryotes seems to be base conversions.(Smith et al,1997) Hydrolytic deaminations are the best known reactions. Here a genomically encoded cytidine ( C) is converted to uridine(U) and adenosine(A) is converted to inosine(I).(Schaub and Keller,2002).


The A to I conversion is catalyzed by the adenosine deaminases that acts on RNA(ADAR) enzymes. Three ADARS were identified in humans, ADAR1,ADAR2 AND ADAR3 (The many roles of an RNA editor) but ADAR1 and ADAR2 are the A to I editing enzymes that have been demonstrated to be enzymatically active in mammals. These enzymes are both essential although differ in their specificity. (Higuchi et al,2000). ADAR3 is only expressed in the brain but has no catalytic activity. Although it can bind single or double stranded RNA it is catalytically inactive on synthetic dsRNA and on known pre-mRNA substrates in vitro. (Schaub and Keller, 2002). Figure 2(a) shows the mechanism of A to I RNA editing by ADARs. Figure 2(b) shows selected members of the ADAR fsmily.

Figure 2(Jepson and Reenan,2008)

"ADAR mediated A to I editing can selectively alter codon specificity when taking place within mRNA coding regions and seemingly subtle changes in amino acid sequences can affect the functional properties of the corresponding proteins". A to I coding takes place in mRNA coding regions, introns and UTRs.( Schaub and Keller,2002).

Site selective editing can contribute to an increased number of protein isoforms. Site selective editing can change amino acid codons and splicing patterns in transcripts. For example when α-amino-3-hydroxy-5-methyl-4-isoxazole glutamate receptor subunit B(GluR-B) is edited properties of the receptor are altered. At the Q/R site a codon which reads for glutamine, CAG is changed to CIG which is read as an arginine codon(CGG) due to inosine in mRNA being recognised as guanosine by cellular machinery. Receptors put together with an edited GluR-B subunit are nearly impermeable to calcium due to the change of the amino acid in the second transmembrane helix of the GluR-B subunit.(Ohlsen et al,2007). For Q/R editing an inverted repeat situated in the downstream intron is necessary.(Higuchi et al,2000) For normal brain development editing at the Q/R site is necessary. This is shown by mice that do not have complementary sequences to the editing site downstream of the intron cannot carry out Q/R editing and develop epileptic seizures and by the age of three weeks.(Brusa et al,1995).

RNA editing has a crucial role in nervous system function as the target ADAR genes are involved in rapid chemical and electrical neurotransmission and several of the edited sites recode conserved and functionally vital amino acids.(Hoopengardner et al,2003). The depth and intricately of the non-coding transcriptome in nervous system tissue gives a valuable substrate for ADAR.

In all metazoans, nervous system tissues noticeably express ADARs which result in a collection of modifications in mRNAs coding of significant proteins in neuronal transmission and synaptic plasticity. Important elements of electrical and chemical signalling machineries may depend on the exact modulation of ADAR editing activity in response to cell identity, fluctuations in the micro-region of synapses and changes in the environment are needed to obtain favourable levels of function.

Tiny differences in secondary structure among similar RNAs are recognised by ADARs which allows "elaboration of a codable language of signalling and recognition between ADARs and the transcriptome. After examination the non-coding transcriptome's secondary structure showed a similar profile to that of the coding transcriptome with "an abundance of the same RNA structural motifs that promote binding and de-amination by ADAR enzymes."

RNAs containing ALU contain thousands of equally distributed editing sites among non-coding and coding members. In the non-coding transcriptomes of nervous system cell types, potential ADAR targets gather by the thousands which cause a map expansion for the importance of ADAR action in the brain.(St Laurent III et al,2009)


The article "Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA capturing and sequencing" describes how A to I editing leads to transcriptome diversity and is necessary for normal brain function (Li et al, 2009). During A to I editing adenosine converts into inosine which is read as guanosine increasing transcriptome diversity. At the moment only 13 edited genes have been identified in the non-repetitive areas of the human genome.

The study of this article deals with the "specificity of profiling which is supported by observations of enrichment with known features of ADAR and validation by means of sequencing" (Li et al, 2009) using a Illumina Genome Analyzer. (Li et al, 2009 online supporting material.) This method can be applied to studies involving RNA editing related human diseases. From seven tissues of a single individual of a single individual several hundred human RNA editing sites were detected by comparing genomic DNA with RNAs. The seven tissues were cerebellum, frontal lobe, corpus callosum, diencephalon, small intestine, kidney, and adrenal. The number of locations that can be profiled by the sequencing of RNA and DNA samples is the limiting factor in the identification of RNA editing targets.

In this study a set of 59,437 genomic locations enriched with RNA editing sites. This compilation excluded repetitive regions such as Alu. The essential criteria for previous predictions of editing targets- RNA secondary structure, conservation and coding potential were not taken into account.

A padlock probe was designed for each of the predicted editing sites with two hybridization arms that lock the target of interest (Li et al,2009 online supporting material). For 36,208 sites, a total of 41,046 probes were designed as sites close to splicing junction required two different probes for targeting gDNA and cDNA. From seven different tissues all derived from a single individual gDNA and cDNA was used to identify RNA editing sites. In separate reactions a pool of probes were hybridized to gDNA and cDNA. The amplicons were sequenced and sites where an A allele was seen in gDNA and cDNA samples had at least a fraction of guanosine reads present were identified. Multiple reads were found in most sites.

For both frontal lobe cDNA and gDNA two independent technical replicates were well correlated. 55.5 million sequences were mapped to the target regions although a total of 57.8 millon reads were obtained. Positions where a homozygous A was observed in gDNA and more than 5% of reads were G in at least 2 of 7 cDNA samples with a log score of greater than 2 were searched to identify RNA editing sites. A total of 239 such sites were identified including 10 of the known edited genes. These sites were referred to as class I set. 18 different sites were selected randomly and amplified with polymerase chain reaction and sequenced with the Sanger method to validate the class I set.

Frontal lobe cDNA and gDNA was tested from donors also. Fourteen of the 18 sites were clearly edited, with a majority in all three donors. The findings in this study correlate to previous studies that have shown that ADARs have a sequence preference for strong G depletion in the nucleotide 5' to the editing site. 55 of the 239 class I sites are located in the coding regions, 38 of these change amino acids. One of these add on an extra 29 amino acids by changing UAG; a stop codon to UGG; a tryptophan.

In many different attempts to identify new RNA editing sites, sequence conservation has been the main criterion. Due to widespread editing of Alu repetitive elements it has been shown that editing is enriched in the primate lineage. It was found in this study that class I sites were enriched with functions such as membrane, synapse, cell trafficking and sites that are found in many genes that are implicated in brain related diseases, this is in agreement with previous observations.

An extra 330 potential candidate editing sites known as the class II set were identified in this study when the criteria was relaxed to one tissue. This was validated by a selected candidate from this set,GLI1 (Glioma-associated oncogene homolog 1 at site chr12:56150891).In all three donors this was highly edited at the frontal lobe. When the editing level threshold was reduced to 2% an extra 141 site were identified, these are known as the class III set. 13 clones were found with a guanosine at the editing site when 118 clones of the class III site chr11:74994333 in MAP6 were sequenced.

The low threshold suggests that many targets are edited at low levels. Most of the non-repetitive do not appear to be conserved beyond primate lineage and may play a role in primate specific functions. The non-coding RNAs that are associated with brain function is where many of the identified editing sites are located. In order to identify more RNA editing sites and measure their editing levels this study can be readily extended to a range of tissues in normal and diseased individuals.

Figure 3 shows a wide spectrum of RNA editing levels.

Figure 3 Wide spectrum of RNA editing levels. For each of the Class I sites, the

editing level was determined by (the number of G reads) / (the number of A and G reads)

when the total number of A and G reads is at least 10. In each of the seven tested tissues,

the sites are ranked by the editing level. Overall, the RNA editing level is widely

distributed from 0 to 100% in all tested tissues. (Li et al, 2009 online supporting material).


Massively parallel sequencing is a relatively new technique that has many advantages and limitations.


The advantages of this technique include the ability of massively parallel sequencing to overcome the limitations of the Sanger technique. This allows projects requiring many gigabases of sequence to be performed quicker and cheaper than the Sanger method. Massively parallel sequencing has allowed uncovering of a large amount of germline and somatic variation in normal individuals.

Another advantage is the ability to detect minor alleles precisely. "Also when a sample is sequenced to sufficiently high depth, the copy number of any particular segment can be inferred from the frequency with which that segment is found among the molecules sequenced." (Tucker et al,2009).


Along with advantages there are also limitations. For example de novo sequence assembly is harder and less complete especially for massively repetitive and rearranged DNA segments or novel genomes due to short read lengths. When it is essential to determine the phase of variants, short read lengths make it difficult for interpretation. At present the error rates of raw sequence data produced is higher than that of Sanger sequencing. A proof reading polymerase in the sequencing process is being introduced and may increase the raw accuracy rate.(Tucker et al,2009).


The future of this technique is very promising with uses in analysis of the causes of disease and mental and development disorders such as autism planned. It also will have uses in the development of new drugs and diagnostics. The US DOE Joint Genome Institute is planning an important application of this method. It will focus its sequencing efforts on new plant and microbial targets that may be used in the development of alternative energies. Sequencing efforts will also be focused on the sequencing of the marine red alga genome which may play a critical role in the elimination of carbon dioxide from the atmosphere. (Ansorge,2009)

There are many applications of massively parallel sequencing. These include full-genome resequencing, mapping of structural rearrangements, RNA-sequencing, analogous to expressed sequence tags or serial analysis of gene expression, large-scale analysis of DNA methylation by deep sequencing of bi-sulfate treated DNA and genome wide mapping of DNA-protein interactions by deep sequencing of DNA fragments. As this technique is developing over the next few years many more applications will be identified.(Shendure and Hanlee,2008)

RNA Editing Diseases

RNA editing has been related to a number of human diseases. Figure 4 shows a schematic representation of RNA editing modifying proteins which can cause abnormal proteins leading to disease. Due to the majority of mRNAs are expressed in the CNS, defects in ADAR activity can lead to neurological phenotype. "Down regulation of the editing at GluR-B Q/R has been reported in brain tumors such as glioblastoma multiform, a malignant type of astrocytoma." Also a decrease in A to I editing is present in tumors of the brain, lung, testis and kidney. A decrease of RNA editing at the Q/R site of the GluR-B causes excessive cellular influx of calcium ions which has been correlated with the sporadic amyotrophic lateral sclerosis disease.Several mutations in ADAR1 gene are linked to an autonomic-dominant skin disorder, dyschromatosis symmetrica hereditaria which may be accompanied by other neurological syndromes such as dystonia and mental deterioration. This highlights the fundamental role of ADARs in the CNS and emphasise that A to I editing is essential for normal cell behaviour since it regulates gene expression at various levels. (Galeano and Gallo,2007).

Figure 4 RNA editing modifying proteins(Galeano and Gallo,2007)


In conclusion Massively Parallel Sequencing is a technique with a bright future. As it is being developed at a rapid rate and costs are continuously being reduced it is only a matter of time before it will be used in clinical applications in place of the Sanger method. This essay highlighted the many studies which massively parallel sequencing can be applied to and how it works. RNA editing related human diseases are also described and the impact that A to I editing has. Applications of massively parallel sequencing will have a huge impact in identifying new RNA editing diseases and also full-genome resequencing, mapping of structural rearrangements, RNA-sequencing, analogous to expressed sequence tags or serial analysis of gene expression, large-scale analysis of DNA methylation by deep sequencing of bi-sulfate treated DNA and genome wide mapping of DNA-protein interactions by deep sequencing of DNA fragments.