Repetitive Dna Constitute Major Component Eukaryotic Genomes Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Repetitive DNA constitute major component of eukaryotic genomes. Almost 45% of the human genome is composed of repeated sequences (International Human Genome Sequencing Consortium). These repeated sequences are an important element in evolution of complex organisms. Many of them are known to play an important role in evolution of novel protein domains. Hence the origin of such repetitive sequences and their genetic role in various aspects are important to understand.

Most of the repeat sequences in DNA fall under two major groups of repeat sequences, Tendemly repeated sequences and Interspersed repeats. Interspersed repeats are inactive copies of transposable elements. It further includes many subclasses of repeat sequences which are short interspersed nucleotide repeats (SINE); long interspersed nucleotide repeats (LINE), DNA transposons and transposable elements with LTR elements. Each class of repeat is associated with certain unique characteristics. Extensive information on the role played by the repeat element can be obtained if the insertion of the element is from recent past as there is less probability of undergoing extensive mutation. But to understand the role it is assumed that their role being played is similar to the role it would have been playing since its origin or from the time of exaptation. Older the elements more difficult it is to identify them and understand their role due to extensive changes in sequence brought about by mutation and sequence changes by other means which results into extensive changes in DNA sequence over the period of time.

Mammalian-wide interspersed repeat (MIR) is a class of repeats belonging to SINEs. In fact MIR is considered to be the tRNA derived SINE due to sequence similarity with tRNA. It is estimated that MIRs were introduced in the genome of an ancestor of mammals about 130 million years ago. It constitutes about 0.4-1% of the genome of the mammal. Approximately there are 368000 copies of MIR in the human genome. MIR was possibly originated in Mesozoic era and was active in ancestors of mammals and birds. It stopped being duplicated before the divergence of placental mammals. Possibly this allowed the amplification and spread of many specific repeat sequences. (Murnane J.P.,Morales F.J.,1995)

MIR elements were originally observed in a hepatoma cell infected with Hepatitis B virus and the flanking region of the site where viral particle was integrated were sequenced. In this experiment 70 bp conserved sequence was observed which was not sharing its properties with any known class of repeats. Subsequently this class of repeat was discovered in many human genes and occasionally in rodents. This repeat sequence was found either in intron or in the 5' or 3' flanking region. (Donehower L.A.,, 1989)

MIR and SINE family of repeats

SINE are interspersed repeats of about 100-300 bp which are found in most vertebrates and invertebrates. One of the ubiquitous sequence present in all placental mammal termed MIR shows presence of some of the characteristics of SINEs as concluded by Arian F.A. Smit (1995) after aligning over 80 sequences containing MIR similarities. There was a consensus sequence of 260bp identified and it is expected to be originally transcriptionally active. Also it has consensus RNA polymerase A and B boxes and an AT-rich 3' end which is a characteristics of typical SINE. Third major characteristic of direct flanking repeats is likely to have become unrecognisable since the MRI fragments have very much diverged from the consensus. At the 5' end there was 80bp of MIR that was recognised appearing to be similar to tRNA. One of the features of MIR copies is that they are truncated at either or both ends but core region was conserved. This could be possibly due to incomplete integration process. Another reason for this could be that central region is better conserved over terminal sequence as latter shows higher content of mutagenic CpG sites. MIR thus appears to be a fusion product of tRNA derived SINE and unrelated sequence.

Distribution of MIR in the human genome

Mammalian interspersed repeats are found in many classes of mammals including placental mammals, marsupials and monotremes. It is considered to be most ancient SINE family. It is believed that amplification of MIR elements stopped in the ancestors of placental mammals.

Human genome is a considered to be the mosaic of DNA segments called Isochores. Distribution of MIR in human genome is explained as per these isochoric regions by Giorgio Matassi (1998). It explains the segments of DNA showing uniform GC content. On the basis of this human genome is divided into four families of Isochores which are L, H1, H2 and H3 each characterised by different GC content and gene density.

Majority of MIR repeats were discovered in L region of human isochores showing least GC content and gene density. MIR density was found highest in H2 region which shows comparatively higher GC content. MIR elements being discovered in human genome are assumed to be the result of retroposition events and further changes brought about by evolutionary process like insertion or deletion of sequences. Important thing to note that these sequences are still conserved at many positions as these are stably integrated repeats.

MIR elements are found in differential density in different isochoric regions. It is found most frequently in L isochore and least in H3 isochore. This indicates mobility restriction on MIR elements to move between different isochoric regions. One of the reasons presumed for this lower distribution of MIR in H3 isochoric region is poor integration of MIR in H3 as the size of intergenic region and intronic sequence is smallest in H3 isochore. Also intergenic sequences are rich in GC content and 3'untranslated regions which could be involved in regulatory role. This could be a possible reason to avoid possible consequences of MIR integration in this region of the genome which might disrupt gene function.

Proposed role of MIR in the gene evolution

There are many evidences about how a transposable elements by variety of means bring about novelty in protein encoding genes by adding new exon to ancient genes. Exaptation of Transposable elements, exon duplication and de novo exonisation from intronic regions can add new exon to evolutionarily ancient genes. Exonisation of primate specific Alu element is studied very well in detail and a similar process of MIR exonisation was being speculated. MIR is now known to play an important role in expression of several mammalian genes following one of the mechanisms mentioned below.

It may provide alternative splice sites and hence may generate more than one transcript for the same gene. It may provide poly A tail for transcripts. It could be involved in introducing new exon in the existing protein coding gene and hence may introduce novel domains in the protein. Also if it is in the intronic sequences then it may include some of the nucleotides in the transcript and hence may play role in introducing additional protein coding information.

One of the observations of MIR indicates possible role of MIR integration in gene control and evolution as proposed by the study of two genes Insulin like Growth Factor 1 (IGF1) and Dendrin gene. These are good examples to illustrate how a MIR element is included in the mRNA sequence and thus became a functional part of the gene. (Hughes D., 2000)

In IGF1 3' UTR sequence is about 400 nucleotides long. It shows presence of core MIR sequence of 89 bp conserved in among all species under study. IGF1 transcript is transcribed and processed into multiple mRNAs. 3' UTR of IGF1 transcript is encoded by single large exon. 3' UTR consist of many blocks of conserved sequences of about 326 bp which includes MIR. This transcript was found in humans, sheep and other vertebrates but not in fish and chickens and hence it helped to draw two conclusions. MIR insertion was early in mammalian development. MIR integration resulted in evolution of new transcript of IGF1.

Similar observations made for Dendrin gene where presence of 2 MIR elements was confirmed by repeat masker. Although the function of the peptide coded by gene is not known but analysis of sequence drew attention to a novel role of MIR that was predicted.

MIR can be inserted in DNA in either sense or antisense strand. Accordingly it might be included in either sense or antisense orientation in transcripts which can form heteroduplex. This indicates the importance of conserved sequences too. If such heteroduplex offers advantage like in case of post transcriptional regulation then flanking sequences may converge to increase sequence specificity and stronger binding between sense and antisense strand. Although this hypothesis is not so far tested but this can be one of the role played by MIR inserted in this pattern following mechanism described but extensive research is needed in this direction if such pattern of insertion is observed in some more gene sequences.

There are examples of genes where MIR elements were involved in the mRNA as a result of creation of alternative splice sites. Some of these observations were made in human genes as was the case with acetylcholine receptor gene. This gene has gained an additional exon during the course of evolution due to the new splice site occurred within conserved core region of consensus sequence. In addition as a result of alterative splicing ~26 bp of MIR element has been incorporated into the coding region of this protein. (Murnane J.P.,Morales F.J.,1995).

3' end of MIR sequence is homologous to four cDNA sequence. Two among these are human beta tubulin and sheep follitropin receptor. Two others were not well characterised. 3' end of MIR sequence contain poly A signal conserved in all four of these cDNA. Hence sequence of MIR element used as poly A signal for these genes. These are the good examples to explain how MIR sequences can play a role in protein evolution by the presence of sequence resembling poly A tail.

There are indication that exonisation of MIR in humans is still going on. Exonisation of MIR can take place at any point of time during the course of evolution and need not be before the divergence of rodents from primates. Like exonisation of MIR in gene ARNTL occurred before the divergence. On the other hand MIR exonisation was seen in gene TTLL6 shows presence of splice site in a testis specific exon. This functional assignment is observed only in humans and hence indicates splicing signal was acquired only recently during human evolution. (Lin L.,, 2009). This also indicates how MIR can be a part of a transcript expressed in tissue specific manner and could have played an important role through alternative splicing sites.

One of the observations made in the MIR element is very unusual in that central core region of MIR is much more conserved over flanking regions of MIR element and possible reason could be those mammalian cells have found use of these sequences.

MIR and Diseases

There are some indications that MIR elements are present in some of the genes which are important in diseased conditions. Hence there are speculations about the role of MIR in this aspect. MIR elements are associated with genes involved directly or indirectly in diseased condition. There are many examples for these but some of them are mentioned here.

Presenilin2 (psen2, Website 1, OMIM ID-+600759) is a gene associated with Alzheimer's disease is known to have MIR element in its gene. Also genes like (TGM2, Website 2, OMIM ID- *190196) transglutaminase2 is known be associated with many diseases like celiac disease, Huntington disease also shows presence of MIR element in its gene. Apart from many other genes like Angiotensin converting enzyme 2 (ACE2, OMIM ID- *300335), Synaptogyrin1 (SYNGR1, OMIM ID- *603925) also has MIR in their genes.

The role of MIR in the disease Chronic Granulomatous disease (CGD) is quite clear. In CGD phagocytes cannot generate microbicidal products like reactive oxygen metabolites. This is due to novel mutation within intron 6 of CYBB gene that activates cryptic exon leading to inclusion of this exon in CYBB mRNA. Although apart from the pathological condition many tissues include this exon in mRNA. The region included in the normally expressed CYBB mRNA shows single ORF and hence it does have coding potential. This cryptic exon belongs to region showing high similarity to MIR. There were no splice sites or donor sites recognised in the non-primate mammals and primates showing homologues sequences. Also some insertions and deletions were observed in these genes. Hence it is speculated that some mutations in MIR enabled DNA to be spliced into mRNA resulting into CGD condition. (Andreas R., et. al, 2006)

Also wild type p53 gene which is a known tumour suppressor gene is known to show presence of MIR element at the 3' UTR region which could be involved in translation activation by promoting polyadenylation of mRNA. Further investigation in this direction might reveal some novel mechanism of p53 gene regulation through some novel targeted mutation in this region. (A. M. D'Erchia,, 1999)

It is assumed that interspersed repeat elements play an important role in unequal recombination events during meiotic crossover and which results in mutations. As a consequence to such recombination event there are occurrences of many genetic disorders and recently shown that it may lead to carcinogenesis as well. (Nystrom- Lahti et. al, 1995). This can be a speculated role of MIR and hence it could be associated with the cause of the disease.

Residual movement of MIR

It was proposed that MIR elements stopped being amplified before the divergence of rodent from primates. If so then common ancestor of both rodents and primates must be showing presence of MIR. The study was carried out comparing human and rodent genome.

If we propose that transposition stopped after the divergence then none of the species should have orthologous component in the other one but this is not completely true. If MIR activity stopped before divergence of primate and rodent then there should not be any detectable similarity between these repeat elements in mouse and human assuming that they were selected neutrally and there was no selection pressure. This is because rate of evolution of rodents is faster than humans. Still there was 20% sequence divergence of intragenomic MIR from their mammalian consensus sequence. This implies that copies of MIR in murine and humans are under highly negative selection

Also atleast 44% of human MIR sequences do not have orthologue in murine genome. Similarly 16% of murine genome do not have any human homologue. This clearly indicates that some of the copies were evolved later after the split of rodents from primates. Hence this forces us to propose that activity of MIR elements have not stopped completely after the divergence of primate and rodents but it might be showing some of the residual movement although this might not be true but it will be clear once this kind of comparative study is carried out in other related genomes (Silva J.C.,, 2009).