Exaptation Of Mammalian Wide Interspersed Repeats Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The proposed project involves investigating the role played by Mammalian wide Interspersed Repeats (MIR) in human genome. MIR elements are supposed to be originated approximately 130 million years ago and are considered to be the most ancient SINE family. MIR elements are speculated to play an important role in evolution of novel genes by various mechanisms. It is speculated that more that 1100 human genes have recruited these elements. Also association of MIR elements with many genes associated with diseased condition makes it an important aspect to study the molecular mechanism behind disease etiology.

The principle AIM for the project will be

To assess the conservation of MIR elements in a particular gene in the human genome

To investigate the role of MIR elements in providing novel components to the endogenous genes resulting into gene evolution

The above aim will be achieved by using following experimental techniques

Bioinformatics tools like SPIDEY, BLAST, CLUSTALW, REPEATMASKER etc

Laboratory techniques like RT-PCR, Cloning, and DNA sequencing (if time permits).

Background to the Investigation


Repetitive DNA constitute major component of eukaryotic genomes. Almost 45% of the human genome is composed of repeated sequences (International Human Genome Sequencing Consortium). These repeated sequences are an important element in evolution of complex organisms. Many of them are known to play an important role in evolution of novel protein domains. Hence the origin of such repetitive sequences and their genetic role in various aspects are important to understand.

Most of the repeat sequences in DNA fall under two major groups, Tendemly repeated sequences and Interspersed repeats. Interspersed repeats are inactive copies of transposable elements. It further includes many subclasses of repeat sequences which are short interspersed nucleotide repeats (SINE); long interspersed nucleotide repeats (LINE), DNA transposons and transposable elements with LTR elements. Each class of repeat is associated with certain unique characteristics. Extensive information on the role played by the repeat element can be obtained if the insertion of the element is from recent past as there is less probability of undergoing extensive mutation. But to understand the role it is assumed that their role being played is similar to the role it would have been playing since its origin or from the time of exaptation. Older the elements more difficult it is to identify them and understand their role due to extensive changes in sequence brought about by mutation and sequence changes by other means over the period of time.

Mammalian-wide interspersed repeat (MIR) is a class of repeats belonging to SINEs. In fact MIR is considered to be the tRNA derived SINE due to sequence similarity with tRNA. It is estimated that MIRs were introduced in the genome of an ancestor of mammals about 130 million years ago. It constitutes about 0.4-1% of the genome of the mammal. Approximately there are 368000 copies of MIR in the human genome. MIR was possibly originated in Mesozoic era and was active in ancestors of mammals and birds. It stopped being duplicated before the divergence of placental mammals. Possibly this allowed the amplification and spread of many specific repeat sequences. (JOHN R. MURNANE and JOSE F. MORALES, 1995)

MIR elements were originally observed in a hepatoma cell infected with Hepatitis B virus and the flanking region of the site where viral particle was integrated were sequenced. In this experiment 70 bp conserved sequence was observed which was not sharing its properties with any known class of repeats. Subsequently this class of repeat was discovered in many human genes and occasionally in rodents. This repeat sequence was found either in intron or in the 5' or 3' flanking region. (LAWRENCE A. DONEHOWER, et al., 1989.)

MIR and SINE family of repeats

SINE are interspersed repeats of about 100-300 bp which are found in most vertebrates and invertebrates. One of the ubiquitous sequence present in all placental mammals termed MIR shows presence of some of the characteristics of SINEs as concluded by Arian F.A. Smit (1995) after aligning over 80 sequences containing MIR similarities. There was a consensus sequence of 260bp identified and it is expected to be originally transcriptionally active. Also it has consensus RNA polymerase A and B boxes and an AT-rich 3' end which is a characteristics of typical SINE. Third major characteristic of direct flanking repeats is likely to have become unrecognisable since the MRI fragments have very much diverged from the consensus. At the 5' end there was 80bp of MIR that was recognised to be similar to tRNA. One of the features of MIR copies is that they are truncated at either or both ends but core region was conserved. This could be possibly due to incomplete integration process. Another reason for this could be that central region is better conserved over terminal sequence as latter shows higher content of mutagenic CpG sites. MIR thus appears to be a fusion product of tRNA derived SINE and unrelated sequence.

Distribution of MIR in the human genome

Mammalian interspersed repeats are found in many classes of mammals including placental mammals, marsupials and monotremes. It is considered to be most ancient SINE family. It is believed that amplification of MIR elements stopped in the ancestors of placental mammals.

Human genome is considered to be the mosaic of DNA segments called Isochores. Distribution of MIR in human genome is explained as per these isochoric regions by Giorgio Matassi (1998). It explains the segments of DNA showing uniform GC content. On the basis of this human genome is divided into four families of Isochores which are L, H1, H2 and H3 each characterised by different GC content and gene density.

Majority of MIR repeats were discovered in L region of human isochores showing least GC content and gene density. MIR density was found highest in H2 region which shows comparatively higher GC content. MIR elements being discovered in human genome are assumed to be the result of retroposition events and further changes brought about by evolutionary process like insertion or deletion of sequences. Important thing to note that these sequences are still conserved at many positions as these are stably integrated repeats.

MIR elements are found in differential density in different isochoric regions. It is found most frequently in L isochore and least in H3 isochore. This indicates mobility restriction on MIR elements to move between different isochoric regions. One of the reasons presumed for this lower distribution of MIR in H3 isochoric region is poor integration of MIR in H3 as the size of intergenic region and intronic sequence is smallest in H3 isochore. Also intergenic sequences are rich in GC content and 3'untranslated regions which could be involved in regulatory role. This could be a possible reason to avoid possible consequences of MIR integration in this region of the genome which might disrupt gene function.

Proposed role of MIR in the gene evolution

There are many evidences about how a transposable elements by variety of means bring about novelty in protein encoding genes by adding new exon to ancient genes. Exaptation of Transposable elements, exon duplication and de novo exonisation from intronic regions can add new exon to evolutionarily ancient genes. Exonisation of primate specific Alu element is studied very well in detail and a similar process of MIR exonisation is being speculated. MIR is now known to play an important role in expression of several mammalian genes following one of the mechanisms mentioned below.

It may provide alternative splice sites and hence may generate more than one transcript for the same gene. It may provide poly A tail for transcripts. It could be involved in introducing new exon in the existing protein coding gene and hence may introduce novel domains in the protein. Also if it is in the intronic sequences then it may include some of the nucleotides in the transcript and hence may play role in introducing additional protein coding information.

One of the observations of MIR indicates possible role of MIR integration in gene control and evolution as proposed by the study of two genes Insulin like Growth Factor 1 (IGF1) and Dendrin gene. These are good examples to illustrate how a MIR element is included in the mRNA sequence and thus became a functional part of the gene. (DAVID C. HUGHES, 2000)

In IGF1 3' UTR sequence is about 400 nucleotides long. It shows presence of core MIR sequence of 89 bp conserved in among all species under study. IGF1 transcript is transcribed and processed into multiple mRNAs. 3' UTR of IGF1 transcript is encoded by single large exon. 3' UTR consist of many blocks of conserved sequences of about 326 bp which includes MIR. This transcript was found in humans, sheep and other vertebrates but not in fish and chickens and hence it helped to draw two conclusions. MIR insertion was early in mammalian development. MIR integration resulted in evolution of new transcript of IGF1.

Similar observations made for Dendrin gene where presence of 2 MIR elements was confirmed by repeat masker. Although the function of the peptide coded by gene is not known but analysis of sequence drew attention to a novel role of MIR that was predicted.

MIR can be inserted in DNA in either sense or antisense strand. Accordingly it might be included in either sense or antisense orientation in transcripts which can form heteroduplex. This indicates the importance of conserved sequences too. If such heteroduplex offers advantage like in case of post transcriptional regulation then flanking sequences may converge to increase sequence specificity and stronger binding between sense and antisense strand. Although this hypothesis is not so far tested but this can be one of the role played by MIR inserted in this pattern following mechanism described but extensive research is needed in this direction if such pattern of insertion is observed in some more gene sequences.

There are examples of genes where MIR elements were involved in the mRNA as a result of creation of alternative splice sites. Some of these observations were made in human genes as was the case with acetylcholine receptor gene. This gene has gained an additional exon during the course of evolution due to the new splice site occurred within conserved core region of consensus sequence. In addition as a result of alterative splicing ~26 bp of MIR element has been incorporated into the coding region of this protein. (JOHN R. MURNANE and JOSE F. MORALES, 1995).

3' end of MIR sequence is homologous to four cDNA sequence. Two among these are human beta tubulin and sheep follitropin receptor. Two others were not well characterised. 3' end of MIR sequence contain poly A signal conserved in all four of these cDNA. Hence sequence of MIR element used as poly A signal for these genes. These are the good examples to explain how MIR sequences can play a role in protein evolution by the presence of sequence resembling poly A tail.

There are indication that exonisation of MIR in humans is still going on. Exonisation of MIR can take place at any point of time during the course of evolution and need not be before the divergence of rodents from primates. Like exonisation of MIR in gene ARNTL occurred before the divergence. On the other hand MIR exonisation was seen in gene TTLL6 shows presence of splice site in a testis specific exon. This functional assignment is observed only in humans and hence indicates splicing signal was acquired only recently during human evolution. (LAN LIN, et al., 2009). This also indicates how MIR can be a part of a transcript expressed in tissue specific manner and could have played an important role through alternative splicing sites.

One of the observations made in the MIR element is very unusual in that central core region of MIR is much more conserved over flanking regions of MIR element and possible reason could be those mammalian cells have found use of these sequences.

MIR and Diseases

There are some indications that MIR elements are present in some of the genes which are important in diseased conditions. Hence there are speculations about the role of MIR in this aspect. MIR elements are associated with genes involved directly or indirectly in diseased condition. There are many examples for these but some of them are mentioned here.

Presenilin2 (psen2, Website 1, OMIM ID-+600759) is a gene associated with Alzheimer's disease is known to have MIR element in its gene. Also genes like (TGM2, Website 2, OMIM ID- *190196) transglutaminase2 is known be associated with many diseases like celiac disease, Huntington disease also shows presence of MIR element in its gene. Apart from many other genes like Angiotensin converting enzyme 2 (ACE2, OMIM ID- *300335), Synaptogyrin1 (SYNGR1, OMIM ID- *603925) also has MIR in their genes.

The role of MIR in the disease Chronic Granulomatous disease (CGD) is quite clear. In CGD phagocytes cannot generate microbicidal products like reactive oxygen metabolites. This is due to novel mutation within intron 6 of CYBB gene that activates cryptic exon leading to inclusion of this exon in CYBB mRNA. Although apart from the pathological condition many tissues include this exon in mRNA. The region included in the normally expressed CYBB mRNA shows single ORF and hence it does have coding potential. This cryptic exon belongs to region showing high similarity to MIR. There were no splice sites or donor sites recognised in the non-primate mammals and primates showing homologues sequences. Also some insertions and deletions were observed in these genes. Hence it is speculated that some mutations in MIR enabled DNA to be spliced into mRNA resulting into CGD condition. (ANDREAS RUMP, et al., 2006)

Also wild type p53 gene which is a known tumour suppressor gene is known to show presence of MIR element at the 3' UTR region which could be involved in translation activation by promoting polyadenylation of mRNA. Further investigation in this direction might reveal some novel mechanism of p53 gene regulation through some novel targeted mutation in this region. (A. M. D'ERCHIA, et al., 1999)

It is assumed that interspersed repeat elements play an important role in unequal recombination events during meiotic crossover and which results in mutations. As a consequence to such recombination event there are occurrences of many genetic disorders and recently shown that it may lead to carcinogenesis as well. (Nystrom- Lahti et. al, 1995). This can be a speculated role of MIR and hence it could be associated with the cause of the disease.

Residual movement of MIR

It was proposed that MIR elements stopped being amplified before the divergence of rodent from primates. If so then common ancestor of both rodents and primates must be showing presence of MIR. The study was carried out comparing human and rodent genome.

If we propose that transposition stopped after the divergence then none of the species should have orthologous component in the other one but this is not completely true. If MIR activity stopped before divergence of primate and rodent then there should not be any detectable similarity between these repeat elements in mouse and human assuming that they were selected neutrally and there was no selection pressure. This is because rate of evolution of rodents is faster than humans. Still there was 20% sequence divergence of intragenomic MIR from their mammalian consensus sequence. This implies that copies of MIR in murine and humans are under highly negative selection

Also at least 44% of human MIR sequences do not have orthologue in murine genome. Similarly 16% of murine genome does not have any human homologue. This clearly indicates that some of the copies were evolved later after the split of rodents from primates. Hence this forces us to propose that activity of MIR elements have not stopped completely after the divergence of primate and rodents but it might be showing some of the residual movement although this might not be true but it will be clear once this kind of comparative study is carried out in other related genomes (J. C. SILVA, et al., 2003).

Hence MIR elements are expected to be the class of interspersed repeats that can contribute to the knowledge about the exonisation of repeat elements in the human genome. It is not as young as Alu elements where exonisation process is not complete also it is not as old as SINEs where extensive changes in nucleotide sequence makes it difficult to understand the exonisation process. Association of MIR elements with disease causing genes makes it an important aspect to study about the role played by MIR in evolution of genes.

AIMS of the research project

A Mammalian wide Interspersed Repeat is a class of interspersed repeats that is not extensively investigated DNA sequences to study the importance of these elements in the mammalian genome. The roles played by MIR elements in the human genome in the following aspect will be investigated.

To assess the conservation of MIR elements in a particular gene in the human genome (gene to be decided before starting the project)

To investigate the role of MIR elements in providing novel components to the endogenous genes by providing polyadenylation site, providing alternative splice site in order to produce population of alternatively spliced transcripts, or if present in intronic region it can be included in transcript and introduce extra amino acid and hence can play a role in protein evolution. Hence keeping these views in front investigate the functional role played by MIR in the human genome if any.

To investigate the role played by MIR in mRNA localisation.

Experimental approach

The major component of the project depends upon the use of bioinformatics tools and molecular biology techniques.

The first and foremost part of the project is the use of bioinformatics tools to identify the presence of MIR in a gene and also to gain some information about the role that it might be playing in that gene. Some of these tools are mentioned here.

Repeat Masker:

It is a program that screens DNA sequence for interspersed repeats and low complexity DNA sequence. The output of program is a detailed annotation of the repeats present in the query sequence as well as modified version of query sequence where all the repeat sequences are replaced with 'N' and then the comparisons are made between the sequences. (Website 6)


Spidey compares mRNA with genomic DNA. It can achieve two goals while making this comparison:

It can find alignments regardless of intron size

Nearby pseudogenes or parlous cannot create problems in identifying exons and introns. Also it can perform cross species alignments of mRNA to genomic DNA. Hence this tool can be used to study MIR sequences among the mammals and between different species. (SARAH J. WHEELAN, DEANNA M. CHURCH and JAMES M. OSTELL, 2001

ClustalW2 :

It is a multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of diverged sequences. It calculates best match for the selected sequence and line them up so that it identifies similarities and differences. Also evolutionary relationships can be seen by viewing cladograms or phylograms. (Website 7)


It finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence database and calculates statistically significant matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as helps to identify members of gene families. There are many types of BLAST but nucleotide BLAST will be used often. (Website 8)

The laboratory experiments will make use of following techniques:

Use of Exon Array:

Exon array is the expression tool to carry out expression profiling of a gene but probing exon rather than only 3' end analysis as was carried out with traditional micro array studies. It works makes use of micro array technology to achieve two goals while studying genome. One of the goals is to study alternative splicing pattern at the exon level with more than 4 probes per exon and allows us to detect specific alterations in exon usage in a transcript. Secondly it helps to study gene expression as it can detect all the alternative transcripts from a gene and hence can carry out expression analysis more efficiently measuring all transcripts from a gene.

These can be used to study inclusion of MIR in a gene especially when it is present in the intronic region and part of MIR is fused with nearby exon and hence it forms part of mature transcript ultimately translated into peptide component and hence by using this technology exonisation of Mir can be studied. (Website -5)

2. RT- PCR:

MIR is known to be a part of transcript if it is spliced alternatively due to presence of splicing site in the MIR when present in intronic region. Also from Exon array studies only splicing profile of exon can be studied. It is not possible to obtain information about the inclusion of exon in transcript is constitutive or not. It is also possible that only in alternatively spliced transcripts it is included. Secondly it is possible that MIR may be included in tissue specific manner in a transcript. Hence to study detail splicing pattern RT-PCR with cDNA sequencing method can be used where it is possible to synthesize cDNA from population of mRNA of a single gene. Also sometimes it is difficult to make use of this method to study MIR exons near short terminal exon or if MIR exons with complex alternative splicing events at adjacent exons. Alternatively cloning of gene with MIR included in transcript in expression vector can also help.

3. DNA sequencing:

If time permits then cycle sequencing or some other advanced sequencing technique can be used to sequence DNA from different mammals to be compared. It can be used to compare orthologous genes between different classes of mammals and so role of MIR in evolution can be studied. This will be needed in case the genome of the organism to be studied is not sequenced


The Gantt chart explaining the time plan to achieve the experimental aims is show below.

Various stages of research project:

Stage 1: Traning to use Bioinformatics tools

Stage 2: To use Bioinformatics tools to identify genes in human genome showing presence of MIR.

Stage 3: Training in experimental techniques

Stage 4: Use of RT-PCR technique to investigate the role of MIR on gene transcript

Stage 5: DNA sequencing if the genome has not been sequenced of the organism to be studying. ( Not yet finalised, Technique will be performed if time permits and also on the results obtained from previous stage results)

Stage 6: Analysis of all the results using bioinformatics tools, statistical methods etc.

Stage 7: Final project report writing and submission of thesis.