Retrotransposon Display And Its Potential Applications Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Pan troglodytes or the common chimpanzee is our closest living relative and shares about 98% genetic identity with Homo sapiens (Goodman 1999). The common ancestor of humans and chimpanzees is believed to have walked the earth about six million years ago, after which divergence and speciation gave rise to the modern common chimpanzees, bonobos and humans (Lee et al. 2007). Common chimpanzees, like humans, are social animals and live in communities of 6-150 individuals, ruled by a single alpha male (Wroblewski et al. 2009). Although they are quadruped, they can walk for short distances in an upright, bipedal manner (Oates et al., 2008). Their similarity with humans makes them most interesting and suitable for genetic and evolutionary research.

A diverse range of habitats is host to these animals including humid evergreen forests, deciduous forests, mosaic woodlands and the savannah woodlands of west and central Africa (Oates et al., 2008). Due to large scale deforestation, habitat destruction caused by farmland encroachment, poaching for bush meat and hunting activities, illegal trade of juveniles for the entertainment industry, as well as disease (such as outbreaks of the Ebola virus (Oates et al., 2008)), the population of chimpanzees has reduced dramatically over the past fifty years. The world chimpanzee population was estimated to be 300,000 in the 2003 census (Oates et al., 2007). The geographical distribution of wild common chimpanzees, (divided between the four subspecies: Pan troglodytes verus, Pan troglodytes, troglodytes, Pan troglodytes vellerosus and Pan troglodytes schweinfurthii) is confined to west and parts of central Africa (Figure 1 shows countries of Africa that are a home to the wild common chimpanzee). Such a rapid decline in number of chimpanzees resulted in them being declared an endangered species in the wild in 1996 (Oates et al., 2006).

To prevent extinction and maintain genetic diversity in the species, genetic analysis and selective breeding of captive individuals is indispensable. For the purpose of conservation, extensive research is being put into the development of genetic marker systems. The next section gives an overview of some of the established systems and their applications.

Figure 1.1 shows worldwide distribution of chimpanzees. Their distribution is limited to west and central Africa. The countries where they are found have been highlighted along with the chimpanzee sub-species native to each region. SOURCE: (Oates et al., 2008).

Genetic markers for conservation

A number of genetic markers have been developed and used successfully in various aspects of genetic analysis. Two types of the markers that have been used in conservation studies are,


These are the most commonly used marker systems to access inter-specific and intra-specific genetic similarities and differences. Microsatellites are simple tandem repeats of 1-6bps and can be found in stretches differing in the number of times the repeat occurs. These co-dominant alleles are inherited by a simple Mendelian pattern and are multi-allelic in populations. Because of their diverse nature they are the most natural choice in studying genetic relatedness, disease susceptibility and phylogeny (Wan et al. 2004). Human short tandem repeats (STRs) with the help of PCR based assays have been successfully employed by (Ely et al. 1998) in the paternity testing and identifying individuals (for examples in poaching or for captive breeding programmes) in West African chimpanzee populations. Another study by (Becquet et al. 2007)., carried out on 84 common chimpanzee and bonobo species and 310 microsatellite loci, was intended to throw light upon inter-specific differences .

Mitochondrial DNA or mtDNA

mtDNA in animals in a circular DNA without introns. Because it is transferred by maternal inheritance and has high sequence divergence (as a result of lack of proofreading during replication), mtDNA is suitable as a genetic marker. This marker system has been particularly important in evolutionary studies and was the marker system used by ____________ to differentiate between the four African chimpanzee species (Becquet et al. 2007).

Restriction fragment length polymorphism (RFLP), variable number of tandem repeats (VNTR), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), etc., are few other examples of the range of assays that can be employed in conservation though marker assisted breeding programmes, genetic mapping and genome fingerprinting (Yuan et al. 2000, Powell et al. 1996).

The development of genetic markers for chimpanzee conservation genetics is an expanding area, and one potential source of informative markers are chimpanzee specific polymorphic retrotransposon insertions. In common with other transposon-based markers, L1 retrotransposon insertions are easily typed, stable, heritable markers, where the ancestral state is known (Xing et al. 2007). These properties mean that L1 retrotransposons are beginning to be investigated for their utility in chimpanzee conservation genetics.

Transposable elements

Transposable elements or mobile genetic elements were discovered in the year 1950 by Barbara McClintock, who was awarded the Nobel Prize for her work on the maize plant. Her work revealed these elements to be abundant in most eukaryotic genomes and in particular Drosophila melanogaster was used as the model organism for the majority of the initial investigations in to this area (McCLINTOCK 1950). Transposable elements are essentially DNA sequences that have the ability to move around or duplicate in genomes (Xing et al. 2007). Depending on their mechanism of mobility, either "copy and paste" or "cut or paste", they are classified into retrotransposons (class I elements) and DNA transposons (class II elements), respectively (Xing et al. 2007). Both classes of transposons however, require enzymes with transposase activity (encoded within the DNA sequence of the transposable element) for successful transposition (Xing et al. 2007).

DNA transposons: these elements constitute ~3% of the human genome (Pace & Feschotte 2007) and have been the less studied of the two classes owing to their relatively recent discovery in mammals (Pace & Feschotte 2007). DNA transposons operate by a "cut and paste" mechanism, where the encoded transposase first helps in excision of the element followed by the formation of nicks in a different region of genome and its subsequent integration (Pace & Feschotte 2007). At the region of insertion, 3-40bps of identical sequence, arranged as direct repeats are generated at either end of the transposons and are called target site duplications (Xing et al. 2007). A typical, full length DNA transposon is depicted in figure 1. Eukaryotic DNA transposons are similar in structure to bacterial transposons (Ostertag & Kazazian 2001a). Research on these elements has revealed them to be have been inactive in primate genomes for at least 40 million years (Pace & Feschotte 2007, Xing et al. 2007, Ostertag & Kazazian 2001(Lander et al. 2001, Shi et al. 2007), meaning they are unlikely to be informative polymorphic markers.

Figure 1.2: A full length DNA transposon with its 5′ and 3′ ends indicated. D represents target site duplications, A shows inverse terminal repeats and T represents transposon sequence within which the two open reading frames (ORFs) are present. DNA transposons do not have a 3′ poly A tail. Adapted from Xing et al. 2007


These elements use a copy and paste mechanism for transposition. An mRNA intermediate is synthesised, which on reverse transcription integrates a cDNA copy at another location in the host genome (the precise mechanism is discussed in subsequent sections). As with DNA transposons, target site duplications are produced on either side of retrotransposons but they may or may not include long terminal repeat sequences (LTRs) (Gogvadze & Buzdin 2009). On this basis, they can further be classified as LTR-containing retrotransposons (endogenous retroviral elements and tyrosine recombinase retrotransposons are examples of this class) and non-LTR retrotransposons of which LINEs or long interspersed nuclear elements are an example (Gogvadze & Buzdin 2009). Retrotransposons may transpose by enzymes encoded by open reading frames (ORFs) present as a part of their own sequence (example LINES) or use the enzymes encoded by other retrotransposons for their movement in the genome (example- SINES or short interspersed nuclear elements) leading to them being sub-classified as autonomous and non-autonomous retrotransposons, respectively (Xing et al. 2007). Recent studies however have shown that even autonomous retrotransposons may require a few host proteins such as DNA repair enzymes for successful transposition (Ostertag & Kazazian 2001a).

Both DNA transposons and retrotransposons together are responsible for a highly variable but often significant amount of most eukaryotic DNA (sometimes known as the C-value paradox) (Gregory 2001). Though the vast majority of these elements are transpositionally inactive, there still exist some competent retrotransposons that seem to be expanding currently in primate and non-primate humanoid genomes (Xing et al. 2007). Approximately 17% of the human genome is made up of LINES, of which the LINE 1 or L1 subfamily represents some of the youngest retrotransposons, including subfamilies that are currently involved in transposition (Ta subfamily) (Mathews et al. 2003, Boissinot et al. 2000). The human genome is estimated to include at least 500,000 copies of L1 elements (Xing et al. 2007). As LINE 1 elements are also involved in mobilisation of other interspersed repeats, (e.g. SINES, such as Alu) more than 40% of the human genome is propagated by these sequences (Pickeral et al. 2000, Goodier et al. 2000). Given that these elements are present in such large numbers in the genome, with some of them retaining the ability to transpose it is not surprising that L1 retrotransposition events have been associated with disease causing mutations. Some of the diseases caused by improper gene functioning, as a result of L1 insertion into important genes are: a type of colon cancer caused by transposition into the APC tumour suppressor gene (Miki et al. 1992) and haemophilia as a result of insertion of an L1 into the factor VIII gene (Shi et al. 2007, Kazazian et al. 1988). However, disease causing insertion events are rare as several factors control L1 activity (Shi et al. 2007).

Discovery of L1retrotransposon

DNA association kinetics or Cot analysis of the mammalian genome gave the first indication of the presence of L1 elements (Craig et al. ). In this investigation L1s appeared to be components of mammalian genomes that rapidly re-annealed during Cot analysis (Craig et al. , Britten & Kohne 1968). Subsequently, genomic DNA was digested with restriction enzymes and the results showed parts of the genome that superimposed upon other in the otherwise uniform pattern formed by single copies (Craig et al.). Sequence analysis of these repeats using DNA sequencing and hybridization with labelled probes showed that restriction fragment repeats in mouse created by BamHI and those in humans created by KpnI were L1 elements (Craig et al., Singer 1982).

Initially, LINE 1 elements were referred to as long interspersed repeated segments (Craig et al. , Singer 1990) or interspersed repetitive DNA elements (Craig et al. , Jagadeeswaran et al. 1981) but they are now called long interspersed nuclear elements, LINE1s or L1s (Craig et al. ). The nomenclature of species specific L1s also includes species and genus of the organism under study (Craig et al.). For example, an L1 element present in humans (Homo sapiens) is written as L1Hs and that of a mouse (Mus domesticus) is written as L1Md (Craig et al. ). Such a nomenclature holds good for proteins encoded by these L1s with these abbreviations preceding the protein; for example, the protein encoded by ORF1 of humans is called L1HsORF1p and that of the mouse is called L1MdORF1p (Craig et al. ). This nomenclature can be applied to all the other LINE elements, i.e., those in addition to LINE1 elements such as LINE 2, LINE3, etc., present in mammalian genomes (Craig et al.).

Structure of an active L1 retrotransposon

An intact, full length, retrotransposition competent L1 (Figure 2) is typically 6 Kilobases (Kb) in length and contains two open reading frames, ORF1 and ORF2 (Scott et al. 1987). These ORFs encode the enzymes required for retrotransposition (Feng et al. 1996). ORF1 is responsible for synthesis of a 338 amino acid nucleic acid binding protein (Holmes et al. 1994, Mathias et al. 1991, Hohjoh & Singer 1996), while ORF2 encodes both an endonuclease and a reverse transcriptase enzyme (Mathias et al. 1991, (Hohjoh & Singer 1996, Cost et al. 2002) (refer to Figure 2 below). Studies on murine L1s show that the ORF1 protein may also have nucleic acid chaperone activity as it was shown to aid annealing of complementary strands and stabilise hybrids (Cost et al. 2002, Martin & Bushman 2001). Between the two ORFs is an inter-genic spacer, approximately 66bp in length (Lee et al. 2007, Kazazian & Moran 1998). The 5′ and 3′ ends of these elements contain untranslated regions (UTRs) (Ostertag & Kazazian 2001a). Within the 5′ UTR is an internal promoter and the 3′ end terminates in a polyadenylation signal (AATAAAA) followed by a poly A tail (Ostertag & Kazazian 2001a). Retrotransposition likely occurs by target site primed reverse transcription or TPRT (Cost et al. 2002, Feng et al. 1996, Luan & Eickbush 1995, Luan et al. 1993).

Figure 1.3: shows an intact, full length L1 retrotransposon. ORF 1 and ORF 2 are the two open reading frames that are responsible for encoding the above mentioned enzymes. S represents the inter-genic spacer. The arrow head pointing upwards indicates the 5′ promoter region. D represents target site duplications generated during transposition. The approximate size of the element is 6kb.

Figure adapted from Badge et al. 2003

Target site primed reverse transcription

Studies conducted on R2 elements (Luan et al., 1993) of Bombyx mori showed that in vitro retrotransposition likely occurs by a process called target site primed reverse transcription (Luan et al. 1993, Ostertag & Kazazian 2001b). The internal promoter present at the 5' UTR of an intact full-length retrotransposon initiates transcription that gives rise to an mRNA molecule (Ostertag & Kazazian 2001b). This mRNA molecule containing the two ORFs is translated to produce the nucleic acid binding (ORF1) and enzymes (endonuclease and reverse transcriptase, ORF2) that are essential for retrotransposition. This phenomenon, in which the proteins encoded by an mRNA molecule in turn act on the same mRNA, is called cis-preference (Cost et al. 2002, Wei et al. 2001). The steps involved in this process are as follows: the endonuclease of the element first forms nicks in the target site on a single strand, followed by the annealing of the mRNA molecule to the single stranded break (Ostertag & Kazazian 2001b). The specificity of the endonuclease is believed to have several sequence and structural parameters (Martin & Bushman 2001). Studies have shown that the phosphodiesters bonds of TnAn sites are best suited for the formation of nicks by the endonuclease (Cost et al. 2002, Martin & Bushman 2001).

The reverse transcriptase activity then initiates reverse transcription using the mRNA as a template, and the point of annealing (free 3′ hydroxyl group) as a primer for elongation. This leads to the formation of an mRNA- cDNA duplex(Ostertag & Kazazian 2001b). Simultaneously, nicks are formed at the target site on the second strand (Ostertag & Kazazian 2001b). When this process is complete, overhangs produced by the endonuclease anneal to the duplex and successfully integrate the newly synthesised segment (Ostertag & Kazazian 2001b). The short strand of mRNA molecule is then removed and second strand cDNA synthesis takes place (Ostertag & Kazazian 2001b). Exactly how the second strand synthesis takes place is still unknown (Cost et al. 2002). This process usually results in the formation of target site duplications at either end of the new insertion, which is sometimes 5' truncated (usually attributed to premature termination of reverse transcription (Cost et al. 2002).

Figure 3: The diagram shows the mechanism of retrotransposition by target primed reverse transcription. Adapted from Ostertag & Kazazian 2001b

Potential application

Due to their nature, L1s can be inherited only by vertical transfer, viz., from parents to their offspring. This, along with their ability to transpose in genomes can provide individuals with a unique set of L1 loci. Their individual specificity together with hereditary nature qualifies them to act as markers in identifying genetic similarities or differences in and across populations (Boissinot et al. 2000, Batzer et al. 1994). The use of L1s in population studies and evolutionary biology is advantageous over classic genetic markers such as single nucleotide polymorphisms (SNPs), microsatellites and RFLP (restriction fragment length polymorphism) because of the following two reasons: firstly, the probability of a novel L1 insertion occurring at the same loci in two individuals by mere chance is so low as to be practically impossible and therefore a common insertion between individuals would indicate genetic relatedness i.e. identity by descent (Boissinot et al. 2000, Batzer et al. 1994). Even if such an event did occur, it would be possible to distinguish the two events by virtue of the differences in length of target site duplications generated independently by each element (Xing et al. 2007, Conley et al. 2005). In contrast, VNTR and RFLP marker alleles have a higher probability of occurring several times in populations and are likely to be identical by state rather than solely by descent (Xing et al. 2007, Batzer et al. 1994). The second advantage of using L1 markers is that the ancestral state is marked by the absence of L1 insertion and so the presence of a L1 insertion would be a definite pointer to a mutational event (Boissinot et al. 2000, Batzer et al. 1994).

It has been shown that only two particular L1 subfamilies, called L1HsTa and L1Hs preTa, have retained the ability to move in a human cell culture assay (Brouha et al. 2003). While the Ta subfamily is dominant in terms of activity (determined by cell culture assays) and responsibility for disease causing insertions, the preTa family has caused one case of disease and recently one preTa element was demonstrated to be active in cell culture (Beck et al., 2010). Within these subfamilies, the bulk of retrotransposition is caused by a small number of elements (Brouha et al. 2003). L1s can be divided into subfamilies based on their relative age and those subfamilies that are present in a group of closely related species are older than one that is restricted to a single species (Ovchinnikov et al. 2002). This is mainly because most insertions are neutral (i.e. neither harmful nor beneficial) and so will be fixed in the population and at a time will be proportional to the effective population size (Ovchinnikov et al. 2002, Boissinot et al. 2001). Genomic dimorphism is another means of estimating the age of an element. A new insertion will always be polymorphic and as evolutionary time passes this insertion gets fixed in a population and may subsequently accumulates mutations that may disrupt its ORFs (Ovchinnikov et al. 2002). For the purpose of evolutionary analysis, shared sequence variants (SSVs) that can be traced at each step of evolution can also be used as an indication to the age of the element (Ovchinnikov et al. 2002)

The genetic similarities between humans and chimpanzees make it possible to extend these studies to chimpanzee genomes with the prospect of finding comparable results. A recent study demonstrated that L1 activity in chimpanzees is likely far greater than in humans (Lee et al. 2007). Two separate, putatively active subfamilies have been identified in chimpanzees- L1Pt 1 and L1Pt 2, in contrast to the single highly active L1HsTa subfamily of humans (Lee et al. 2007). (Khan et al. 2006) state that more than one active subfamily can exist in a genome only if their 5′ UTRs are significantly unrelated but when these sequences (L1Pt 1, L1Pt 2 and L1HsTa) were compared with each other and with the L1Pt1 promoter it was found that there was very little difference between the three (Lee et al. 2007). This might indicate that, although they seem to have co-existed in the chimpanzee genome for many millions of years L1Pt2 seems to have an advantage over the other, L1Pt1, subfamily (Lee et al. 2007). The following two factors support this possibility: firstly L1Pt2 is shown to exist at twice the copy number of L1Pt1, with most of the L1Pt2 elements being of a relatively younger age and secondly, intact ORFs have been so far only been detected in L1Pt2 elements (Lee et al. 2007). This may indicate that the emergence of a new sub-family perhaps compromises the activity of the older ones (Lee et al. 2007). The chimpanzee genome could be used as a model to study this phenomenon, in comparison to the human genome.

There is an important caveat to inferences drawn from chimpanzee genomic sequence data. The vast majority of chimpanzee DNA sequence data has been generated from a single male chimpanzee, (Clint) using the whole genome shotgun approach (WGS) (Chimpanzee Sequencing and Analysis Consortium 2005) and assembled using the human genome as a reference. Therefore, it is likely not a reliable source of information for identification of chimpanzee specific L1 insertions as the miss-assembly of chimpanzee specific repeat sequences will produce systematic under-representation of these elements.

Aims and objectives

The main focus of this project is to increase our knowledge of chimpanzee specific L1 insertion loci, by optimising an existing chimpanzee specific L1 display system, derived from ATLAS (Badge et al., 2003) to identify individual specific L1 insertions. Finding an intact full-length insertion would indicate its young age and hence increase the likelihood of the element being active. Further, (PCR and Southern hybridization based) genotyping these insertions will validate their insertion state i.e., their presence or absence, in the chimpanzee individuals screened by L1 display. If enough data could be generated from genotyping these individual specific insertions this could be used in chimpanzee conservation as form of L1 DNA fingerprinting.

The long term aims of the project include the cloning of potentially intact, full length L1 insertions and their testing for retrotransposition in cell culture assay systems. The individual specific nature of the genotyping data could potentially also aid in assessing genetic similarities and differences in the study captive population (Common chimpanzees housed at Twycross Zoo). This may give some scope for potential application of L1 genotyping in breeding programmes in Twycross zoo. Also, extensions of these studies may perhaps aid evolutionary biology by helping to understanding the patterns of divergence between great ape species such as orang-utans, gorillas, chimpanzees, bonobos and human beings.

Experimental approach

Amplification typing of LINE 1 active subfamilies or ATLAS, developed by Richard M. Badge, Reid S. Alisch and John V. Moran is a method of selectively amplifying a segment of human specific L1 insertions (either 3′ or 5′) and a part of its flanking genomic DNA (Badge et al. 2003). This was achieved by the combination of suppression PCR and L1 display (Badge et al. 2003) . The system shows high reproducibility and is effective in the identification and study of L1 insertions, as genome sequence databases do not provide complete coverage of polymorphic, repetitive DNA sequences (Badge et al. 2003). Recently, a chimpanzee specific variant of ATLAS was developed by modifying the established ATLAS protocol to target sequence variants specific to full length, chimpanzee specific L1 elements (Badge et al. 2003).

Principle of ATLAS: As shown in Figure 3b, genomic DNA from one individual is digested with a restriction enzyme having a known recognition sequence. This causes fragmentation of genomic DNA such that segments of DNA containing a part of the L1 element and its flanking DNA are obtained. A GC rich DNA linker is ligated to both the ends of each of these fragments. Intra- molecular annealing generates single stranded molecules with the appearance of a frying pan. In this way, other products of ligation are suppressed during PCR. The PCR primers used are L1 specific and linker specific. Annealing of the L1 specific primer within the frying-pan can cause the structure to relax, forming linear products having only one linker-containing end. Exponential amplification can now take place. The next stage involves radio labelling of the amplified PCR products using a P32 labelled oligonucleotide in a linear PCR, and fractionation by denaturing polyacrylamide gel. Autoradiography shows bands that correspond to various L1 elements and flanking genomic DNA (Badge et al. 2003). Sequencing of desired fragments of DNA, excised from the polyacrylamide gel, can be used for further analysis.

Figure 1.4: Part a shows the structure of an intact LI and part b illustrates the principle of ATLAS. First step represents digestion with restriction enzyme followed by annealing of fragments with GC rich linker. Denaturation and intra-molecular annealing gives the appearance of a frying pan. Suppression PCR helps prevent non-L1 containing from amplifying. The final step involves linear amplification with a labelled primer. Display of amplified and labelled product is by polyacrylamide gel electrophoresis. Source: Badge et al. 2003