Next Generation Sequencing In Trancriptome Analysis Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Analysis of transcriptome has become a standard tool in clinical research. The patterns of gene expression in diseased individuals is important by using these data to elucidate etiology, refine diagnosis, predict prognosis, and improve treatment of diseases based on the data presented in the transcriptome of pathologic samples. Gene expression patterns vary during development, between cells and tissues, and in health and disease. Measuring gene expression has therefore received much attention in the past (Ruben et al. 2008). This concept has been applied mostly to cancer in an effort to deduce patterns of gene expression that predict tumor progression and facilitate risk of patients with early stage disease. This approach has also been used to identify molecular pathways that are critical to tumor survival, and growth. Transcriptome profiles in individual tumor samples can be used to tailor therapies that are specific to the collection of oncogenic stimuli that drive individual tumors. This realization will require, at the least, reliable means to quantify gene expression, robust bioinformatics tools to define and identify the critical oncogenic signaling pathways, and mechanism-based inhibitors that may be used to block these pathways. Two contrasting strategies exist to assess transcriptome activity globally: first, based on hybridization and second, based on DNA sequencing (Ruben et al. 2008).

Historically, cDNA arrays were the first tools to measure gene expression comprehensively, using a complex mixture of cDNA prepared from whole organs as hybridization probes (Gress et al. 1992). cDNA-clonebased transcriptomics benefited from the large-scale generation of expressed sequence tags (ESTs). Microarrays have been the method of choice for transcriptome profiling for more than a decade. These platforms have the ability to yield relative expression data on a large number of transcripts, and the impact of microarray technology on clinical experimentation has been enormous. However, microarray technology suffers from insufficient sensitivity, narrow dynamic range, and nonspecific hybridizations. In addition, this technology can only provide information about the transcripts that are included on the array. Other limitations include inaccurate annotation of probes, and differences in hybridization efficiency owing to nucleotide sequence of the transcripts and their cognate probes. Microarrays in general interrogate only the more abundant transcripts, which may represent 50% of total transcriptome. Analysis of microarrays requires complex background subtraction, normalization, and summarization algorithms, and the data output is more relative than quantitative. It is very difficult to compare expression data from one experiment to another. It means that microarray technology has reach their limits in trancriptome analysis.

Others group such as The Kinzler and Vogelstein group has approach over a decade ago, a sequence-based to analysis of gene expression. This technology, known as serial analysis of gene expression (SAGE) relies on cloning and sequencing small tagged cDNA fragments (Velculescu et al. 1997). The principle of the SAGE methodology is based on the assumption that a short sequence tag contains sufficient information to uniquely identify a transcript provided that the tag is obtained from a unique position within each transcript. It is now estimated that 99.8% of 21 base-pair tags theoretically occur only once in the human genome (Saha et al. 2002). The sequence tags are linked together to form long serial molecules that are cloned and sequenced and the quantification of the number of times a particular tag is observed provides the expression level of the corresponding transcript. SAGE has many technical advantages over hybridization-based protocols. Background correction is not required and the output is quantitative. However, the throughput of DNA sequencing technology has, until recently, been limited by the need to amplify DNA through bacterial cloning and by the traditional Sanger approach of sequencing by chain termination.

DNA sequencing is now gradually moving from Sanger-based technology, coupled with capillary electrophoresis, to next-generation sequencing (NGS) approaches based on sequencing-by-synthesis, or other concepts, coupled with massively parallel processing of templates (Meyer et al., 2008). While Sanger technology may yield up to 100 kb of sequence data per run on standard sequencing instruments, the new high-throughput platforms yield from several hundreds of Mb up to several tens of Gb of DNA sequence per. Recently, NGS approaches for retrieving mtDNA genomes based on overlapping amplicons, have been reported (Jex et al., 2008).

The next-generation sequencing (NGS) technology breaks the limitations of the traditional Sanger sequencing and therefore is capable of massive parallel sequencing. First, each fragment of DNA is polymerase chain reaction-amplified independently, in a way that keeps the amplification products spatially clustered. Then, Sanger’s terminator technique is replaced by sequencing either by synthesis or by ligation. The current NGS platforms, including Roche 454’s Genome Sequencer, Illumina Genome Analyzer, and Applied Biosystems’ SOLiD, analyze up to tens of millions of DNA fragments simultaneously and generate gigabases of sequence information from a single run. NGS has revitalized SAGE technology. For example, the Digital Gene Expression Tag Profiling using Illumina Genome Analyzer analyzes up to 7 million tags per sample per flow cell channel (8 channels total) from constructed libraries of unique, positionally known 20- or 21- base pair cDNA tags. It calls a transcript expressing at the level of one copy per cell 22 times, based on the generally accepted estimate that one transcript per cell equals one copy per 350,000 transcripts (technical white paper, Illumina).

There are basically two applications of NGS to transcriptome profiling that are currently in use or under development. One uses oligo-dT priming for first strand cDNA synthesis and generates libraries that are enriched in the 3' untranslated regions of polyadenylated mRNAs. Massive parallel sequencing of these 3'-tagged libraries yields quantitative of transcript abundance. The other -application sequences full-length cDNA libraries and, in theory, collects both quantitative and qualitative information about the entire transcriptome. Total cDNA libraries are much more complex than those derived from 3' ends, the depth of sequencing required for analysis of rare transcripts within cDNA libraries is much greater. Full-length cDNA sequencing has the potential to identify novel splicing patterns and heretofore undetected mutations that may underlie disease. The future of sequence-based transcript profiling is very bright.

Sequencing-based transcription profiling, using either 3' tag sequences or full-length cDNA libraries, has the potential to add new quantitative and qualitative dimension to the application of gene expression profiling to diagnosis and therapy. New technologies, including single molecule sequencing, are being rapidly developed, which may reduce the cost, enable even higher throughput, and significantly shorten run time. However, the challenges for bioinformatics and information technology infrastructure remain.