Selection Pressure And Conserved Sequences In Rna Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Much of the Human genome comprises of sequences for non coding RNAs. Their role in genomic function is the basis of much of current research. It is now known that they play various roles in gene regulation and other genetic processes. Their role in tumour formation and cancer has been reported in many studies (references). Long non coding RNAs (lncRNA) are a class of such non-coding RNAs which are also implicated in carcinogenesis. The proposed research project aims to characterise a newly defined genetic sequence BUC6 implicated in breast cancer which may be an lncRNA. The proposed project aims to study the yet undefined transcriptosome, study its probable conserved sequence, and try to evaluate its evolutionary background with regard to sequence similarity with conserved coding genes across species. This study will help in understating the role on lncRNA in tumerogenesis and cancer; can lead to the development of novel biomarkers for breast cancer apart from expanding our knowledge of the role of lncRNA in genomic composition. The project aims to use a combination of Bioinformatics and Molecular Biology tools for the study.


Non-coding RNA (ncRNA) as is apparent by the name, do not code for a functional protein. In humans, protein coding genes constitute only about 2% of the genome. However, about 70% of the genome is transcribed into RNA. These ncRNA seem to play important role in gene regulation mechanisms.

Non-coding RNAs are defined by their length. In general, long ncRNA (lncRNA) are >200 bases to about 100 kilobases long. New discoveries indicate that lncRNA may play important roles in a diverse range of molecular mechanisms. lncRNA are located within the intergenic region, they are transcribed as complex and overlapping transcripts with protein coding genes.

Selection pressure and conserved sequences.

According to various sequence studies, lnc RNA are generally not well conserved. This may suggest that they have high selection pressure on short regions in their sequence; it implies that only some short regions may be conserved. However, it also indicates that lncRNA do contain strongly conserved regions. Like mRNA, lncRNA also seem to be transcribed by RNA polymerase II, but may not be further processed.

Microarray and large scale genome sequencing data indicate that a large part of the mammalin transcriptosome comprises of lnc RNA. Its involvement in gene regulation, chromatin modification, splicing, translation, transcription and degradation is found on the basis of RNA complimentary interaction studies.

Role in Cancer

Cancer which is a complex multistep genetic and epigenetic disease with a combined effect of mutations, down-regulation, over-expression and deletions of oncogenes and tumor suppressor genes has been well documented. Recent analytical approaches, including transcriptome analysis, high throughput screening and alternate genomic studies have indicated that defects in ncRNAs may have a role in development of tumors. Although the role of micro RNAs' (Short ncRNA) in tumerogenesis has been well explained, overexpression of specific lncRNAs has been shown to be associated with several types of tumors. The first lncRNA to be described in this context was H19 gene since then, the number of lnc RNA with defined role in different types of cancer is ever increasing. For example, the increased expression of MALAT1 is associated with non-small cell lung cancer, uterine endometrial stromal sarcoma, and hepatocellular carcinoma. The possible role and over expression of these lncRNA make them potentially potent biomarkers for cancer detection. Their study has the potential to provide new insights in genomic organisation, gene evolution and selective pressure at molecular level. It seems that a greater understanding of the cnRNA behaviour will give a whole new dimension to our understanding of the molecular events and their complexity in higher organisms.

Research objectives:

The BUC genes (Breast UniGene Cluster) which are novel breast-associated gene have been subject to much of research recently. A new gene to be sequenced and implicated in breast tumour formation is BUC6, a 2420 bp long stretch of DNA. The apparently small sequence length of the gene and the fact that virtual mRNA sequence generated is normal coding without any disruption (See Appendix) is indicative of its lncRNA coding nature. The gene seems to have strong homology with specific sequence stretch of human ATP8A2 gene, a cation transport ATPase (P-type) protein family. 2 isoforms of the human protein are produced by alternative splicing. The ATP8A2 gene is located on chromosome 13 in humans. A cross database search for ATP8A2 in NCBI database gives 15 hits across species and kingdoms showing certain conserved sequences which may have similarity to the BUC6 sequence. We have also done a cross species similarity search with the specific BUC6 sequence using the EMBL-EBI database across genomes. Information stored in the databases is generated by whole genome shotgun sequencing, a more precise method of large scale sequencing. The results are included in the appendix.

The proposed research project is focussing on some major question regarding lncRNA as a whole and the role of BUC6 in particular.

One of the major questions today is the role of lncRNA in tumerogenesis and cancer. The project proposes to identify the genomic context of BUC6 and characterise its expression, which will help in studying its possible role in cancer.

Another aspect on lncRNA which is not completely understood is its conservation in genomic evolution. By doing a contextual hierarchical genomic study of the sequence, we aim to find out if the BUC6 sequence has undergone a conversive or diversive evolution across species.

This study will also try to increase our understanding of the nature of selective pressure working at the RNA level and in non coding regions of the genome, thereby giving an insight in the nature of genomic evolution of higher living systems.

Overall, the proposed project aims to address some of the key questions in the interrelationship of ncRNA, carcinogenesis, molecular cellular role of ncRNAs and the evolutionary history of the lncRNAs.

Proposed Research Strategy

According to the nature of study, the project design consists of three distinct methodical stages. They are: Genetic and evolutionary contextual study using bioinformatics tools, direct lncRNA quantification using Real Time PCR (Polymerase Chain Reaction) and finally, Cloning and possible expression profiling of the BUC6 sequence using a suitable cloning vector.

Evolutionary and Genomic study:

Similarity searches for BUC6 sequence with ATP8A2, other possible coding sequences and conserved sequence analysis, sequence similarity across species, homology and evolutionary sequence analysis will be done using a number of alignment tools for RNA, DNA and Protein sequence alignment and cross alignment. Notably BLAST, FASTA and ClustalW will be used. To identify genomic regions and coding sequence tagging, ESTs derived from but not confined to ATP8A2 5and 3 prime ends will be used. The databases used will be NCBI (National Centre for Biotechnology Information) with a large consortium of genomic information and analytical options, EBI (European Bioinformatics Institute, GenBank along with other institutional genome browsers.

BLAST (Basic Local Alignment Tool) is a heuristic alignment tool, first aligning short matches and then extending to achieve higher scoring alignment. It has high performance output in both speed and sensitivity. FASTA (Fast All) matches identities adjustable numbers of consecutive matches. Different scoring matrices are used for insertions, deleation and mismatches to achieve optimal alignment. ClustalW is the tool of choice for multiple sequence alignment. It has a combination of phylogenetic and heuristic approach. ESTs (Expressed Sequence Tags) are short sequences of cDNA (Complimentary DNA) useful in determining gene sequence and in identifying transcripts.

RNA analysis using Real-Time PCR:

Real-Time PCR to quantify the probable lncRNA expression from the BUC6 gene is proposed. A combinational approach to PCR may be decided as, the sequence is non coding. The gene sequence will be transcribed into lncRNA; the transcribed lncRNA can then be used to generate cDNA for further analysis. LncRNA can be studied for its binding targets in tumour and normal breast cells.

Real-Time or Quantitative PCR can produce accurate dynamic range of 7-8 logs of magnitude. The process does not require post amplification modification and is very sensitive. It can be modified to use Reverse transcriptase for generating cDNA.

Expression profiling of the lncRNA using a suitable expression vector:

Depending on the results obtained in the above steps and time, the cDNA clone generated from the BUC6 RNA by using RT-PCR (Reverse Transcriptase PCR) will be integrated in a suitable expression vector (The exact nature of the vector to be decided at a later time depending on the progress of the project) for transcriptional and functional studies.