The discovery of transposons

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.



Since the discovery of transposons by noble prize winner Barbara McClintock in 1950 there has not been much development in the area up until the last few decades. Due to advancements on technology it has been possible to detect and discover new transposons. There have been many new computational methods developed that has helped in understanding the biology of transposons and the significant effect it has had on human evolution. There has also been increasingly more evidence that transposons can be used in gene therapy, due to their ability to "jump".


Transposons are a class of genetic material that have the ability to "jump" from a location to another within a genome. They are called jumping genes, however they are always maintained in integrated sites in the genomes (Encyclopaedia Britannia, 2009). Transposons were first discovered in maize by Barbara McClintok in 1950, while performing an experiment designed to reveal the genetic composition of the short arm of chromosome 9, where she observed a high frequency of varigation in maize. Barbara McClintok also found that these spontaneous translocations were not random with respect to their fusion and point of break off (McClintock, 1950). Since the discovery by McClintok, transposons have been identified in all kingdoms and have been divided into three Classes:

  • Class 1: Retrotransposons
  • Class 2: DNA Transposons
  • Class 3: Miniture Inverted Transposable elements (MITE)

Retrotransposon are a unique group of transposable elements (TE's) and use a "copy and paste" mechamism (Encyclopaedia Britannia, 2009). Therefore, it inserts a copy into the genome as well as leaving behind the original copy forming repetive copies of DNA leading to vast spreading of

Transposable elements (TE's) in the genome. The mechanism of retrotranscription is simple, first the transposable DNA is copied into RNA and then the RNA is re-inserted into genome by target primed reverse transcription, (TPRT), as it needs to be converted to DNA. (Lewin, 2004)

DNA Transposons are segments of DNA which move with "cut and paste" mechanism from DNA to DNA and most encode enzymes which have the ability cleave the end of transposons (Lewin, 2004). This allows the freeing of tranposons from its initial location on genome. These enzymes also have the ability to cleave the target site and are called transposases. DNA transposons can be further divided into 3 groups; cut and paste, mavericks and helitrons (Fig 1).

MITE's are characterised by there short length, however its mechanism of translocation is not well understood. There have been thousands of MITE's discovered in Oryza Sativa (cultured Rice), Caenorhabditis (type of naematode) and other organisms. MITE's differ from other transposons as they do not encode proteins and most insert into euchromatin. Therefore, suggesting that it may play a role in genetic regulation (Encyclopaedia Britannia, 2009).

Although transposons were discovered in the 1940's, only recently we have began to understand how they function in the genetic world. The completion of the human genome sequence has shown that almost half of the human genome is acquired from the action of TE's (Bohne et al, 2008) (fig 3). Considering that only 1.5% of the human genome is protein coding, the contribution of TE's is even more remarkable. DNA transposon consist of 2.8% of the genome, however they are no longer active in the human genome. They were active in early primate evolution 37 million years ago (Cordaux and Batzer MA, 2009). Retrotranspsons are currently active and can be sub-divided into two group, which are characterised by the presence or absence of long term repeats (LTR) (fig 1). The LTR elements in the human are endogenous viruses (HERV's) and were inserted into the human genome approximately 25 million years ago . HERV's and its related elements make up around 8% of the human genome, however its activity has now nearly diminished in the human genome. Majority of TE's active currently are non-LTR retrotransposons which include; long interspread elements 1 (LINE-1 or L1), ALU and SVA elements; collectively making up a 1/3 of the human genome (Cordaux and Batzer MA, 2009 ).

Transposable elements can initiate gene rearrangement directly or indirectly:

  • Transposition itself may cause deletion, inversion or the movement of host sequence to a new location.
  • Transposons also function as "portable regions of homology", serving as a substrate for cellular recombination system. As a result, the two copies of transposons at different locations act as site for reciprocal recombination. Therefore causing deletion, insertion, inversion or translocation (Lewin, 2004).

Initially transposons were thought to be selfish DNA, only interested in self-propagation and were compared to a parasite. This was due to its ability to remove inactive important genes or become a burden on the cellular system. However transposons were also advantageous as it caused genetic rearrangements which lead to increases chances of survivial and carry the active transposon to the next generation. If the transposon did convey any selective advantage upon the genome then it must be indirect (Lewin, 2004). All transposons have an inbuilt mechanism that limits their activity. A standard transposon is flanked by inverted repeats and generates direct repeats of a short sequence at target site. There are two main types of transposons. The simplest form of a transposon is an insertion sequence (IS) consisting of flanking inverted terminal repeats and a sequence that encodes transposase. Composite transposons are the other type of transposons that have flanking IS elements providing transposase activity and a sequence in between that carry markers such as drug resistance (Lewin, 2004). Transposase is an enzyme which recognises the end of transposons and links them to the target site.

Transposition can occur via replicative or non replicative mechanism which is determined by the order of events and nature of connection between target and transposon. They all use a common mechanism, where a staggered nick is made at target site on each strand by a fixed distance of 5 to 9 bases (Lewin, 2004). The transposon is inserted by joining to the protruding strand. Replicative transposition proceeds by generating a cointegrate which is the fusion of the donor and target replicons. The cointegrate has two copies of transposon, lying between the original replicons . The recombination regenerates the original replicon while the recipient has also gained a transposon. This reaction is catalysed by resolvase encoded by the transposon, which provide site specific recombination function. An example of transposons that replicate using this mechanism is the TnA family of transposons (Lewin B, 2004).

A Non-replicative transposition occurs when the transposon is moved directly to the donor site and uses a cut and "paste and paste" mechanism (Feschotte and Pritham, 2007). This mechanism is usually undertaken by a single transposase enzyme where the transposon is separated from the flanking DNA. A nucleoprotein containing transposase is required for the "cleavage" of the transposon ends, nicking of the target site and connection of the transposon ends to the staggered nicks. The loss of the transposon from the donor site forms a double stranded break, whose destiny is unclear (Lewin B, 2004).

Somatic variation in plants especially in maize highlights the consequence of the existence and activity of tranposons. This was due to the transposition of "controlling elements" close to genes with visible effects on phenotype (Fedoroff, 2009). Also as maize displays clonal development, the transposition event can be visualised with clonal analysis. For the transposition to have an effect, it must occur in heterozygotes to alter the expression of alleles, so descendants can display new phenotypes (Fedoroff, 2009). Each family of transposons in maize has autonomous (enables) and non-autonomous elements (eliminate the capacity to catalyse transposition.)(Lewin B, 2004).

In this essay I will discuss the current methods used to investigate transposons, along with the effect transposons have had on evolution and its potential use in gene therapy.

Investigative methods for transposons:

Methods to discover and detect transposable elements have come a long way since the discovery by Babara McClinktok. The growing importance of transposons has lead to a number of innovative computational analytic techniques to be developed such as De Novo, homology based, structure based and comparative genomic methods.

De novo methods use intrinsic repetition of the mobile DNA in the genome in an attempt to discover new TE's, without the use of any prior knowledge about the structure or any other similarities to other TE sequences (Bergman and Quesneville, 2007). This is referred to as the De novo repeat discovery method and is heavily dependant on sequencing and assembly sequences strategies, as it uses assembled sequence data. Often the best approach is to find two similar sequences at different locations in a self genome comparison and then cluster the pairs to obtain repeat families (Bergman and Quesneville, 2007). However the method is not specific to TE's, so it also finds repeats produced by other processes such as tandem repeats, segmental duplication and satellites. The main down fall of this strategy is the inability to detect TE families with low numbers or if they are completely derived from non-overlapping fragments. First the repeats are detected, then there is the question of distinguishing TE's from other repeats and identifying distinct families which is very difficult to achieve due to the complexities of TE biology. Initially conventional searches are used to detect repeats such as suffix tree or pairwise similarity searches. Then the pairs need to be clustered and the non-TE repeats have to be filtered. An example is the PILER de novo repeat discovery system (Bergman and Quesneville, 2007). Another system that could be used is homology based.

Homology based methods utilize prior knowledge of TE sequences, therefore giving it a distinct advantage over de novo methods. This method detects new TE families by detecting homology to known TE coding sequences (Bergman and Quesneville, 2007). Therefore this method is more likely to detect authentic TE's, even those present with single copy. However homology based methods are biased towards detecting TE's from previously identified families and are not capable of identifying certain classes if TE's which are composed of non-coding sequences such as MINES. Homology based detection methods are usually applied to assembled genome sequences (Bergman and Quesneville, 2007).

TE's can also be detected by structure based methods, which also uses prior knowledge to detect structural features of transposons such as LTR, while de novo uses the entire sequences of end product. Structure based methods are similar to homology based methods as it can detect low copy TE's and has high specificity, while also being less biased. Structure based methods require specific models which must be designed for every TE under consideration, therefore limiting its use (Bergman and Quesneville, 2007). Some TE's are more robustly structured, so easily detected. LTR-STRUC was the first method developed that used a heuristic seed and extend Strategy (McCarthy and McDonald, 2003). However a major fault in this system is that only TE's in same contig can be detected. LTR_par was another method that was developed, which also uses structure based methods to detect LTR-retrotransposons. LTR_par allows degenerate nucleotides in its alphabet which resolves the problems that LTR_STRUC faced. There are many other structural methods such as SMartfinder, LTR_FINDER, HMM etc. These discovery methods which use structure specific methods have been successful in developing subclasses of MITE's (Bergman and Quesneville, 2007).

Another method to detect new TE's is Comparative genome methods. This method utilizes the fact that transposition creates a large insertion which could be detected in multiple sequence alignment. Using comparative methods is advantageous as it helps detect new TE family and date them chronologically (Bergman and Quesneville, 2007). The usability of this method also depends on the activity of TE and if the insertions are ancestral, then no TE's will be detected. This comparative approach is reliant on the quality of whole genome alignment, which can be compounded by the multiple alignments of draft genomes and will be poor in TE rich regions. However, this method is justified by the recent explosion of multiple genome sequence for closely related species (Bergman and Quesneville, 2007)

When considering the detection of TE's, all the above methods can be used but require further steps as they show low sensitivity. The detection process is initiated by producing a reference set of TE sequences by utilizing the methods mentioned above (Bergman and Quesneville, 2007). Followed by the identification of consensus sequence and classification of TE type. There are 2 main aims of detecting TE's in genome:

  1. Used in bioinformative tasks (such as gene finding or alignment) as a way to mask a pre-processing step.
  2. Studied directly to make inferences about TE biology (Bergman and Quesneville, 2007).

These aims are commonly incooporated in genome sequence censors and Repeatmaster. These approached obtain better results but are too computationally intensive to run routinely (Bergman and Quesneville, 2007).

The growing importance of TE's and its implication has lead to many new sophisticated techniques to be developed. Studies have shown that no single method is more effective than another. Therefore to obtain better results, it's a good idea to combine results from multiple independent techniques (Bergman and Quesneville, 2007).

Impact of transposons on evolution:

Over the past tens of millions of years, the amount of non-LTR retrotransposons have increased enormously and has had a substantial effect on the evolution of the human genome. A key notion that helps in understanding evolution is that L1, Alu and SVA can be divided into subfamilies of different age and are arranged in a hierarchy (Cordaux and Batzer, 2009). Therefore, indicating that the subfamilies show ongoing linear sequential evolution. An example of phenomenon is that all L1 retrotranspons are derived from a single lineage over the past 40 million years (Kahn et al, 2006). Similar patterns have been shown for Alu and SVU. These observations suggest that their must be only a few elements that are involved and responsible for retrotransposition called master genes (Cordaux and Batzer, 2009). A study analysing over 200 Alu subfamilies found 80-100 retrotranspostion-complements L1 copies, of which 6 were responsible for majority of activity called hot L1's (Brouha, 2003) (fig 3). Another crucial characteristic of non-LTR retrotranspons is the ability to stay active over millions of years. The current theory suggests the existence of stealth drivers which stay in the genome with low or no activity while conserving the ability to mobilise and generate high rate of new copies (Cordaux and Batzer, 2009).

Non-LTR transposons have been active and accumulating for millions of years and have had a significant impact on evolution by affecting structure and function of the primate genome. It is believe that the current rate of insertion for L1 is approximately 1 in 20 births based on disease causing de novo insertion but 1 in 200 births based on genome comparisons (Cordaux R et al, 2006 and Xing et al, 2009). This difference seems to be due to underlying assumptions of the methods used. There seems to be no such bias in Alu element insertion, while the rate of SVA insertion is 1 in 900 based on genome comparison (Xing et al, 2009). These amplification rates are not consistent over time. The ongoing accumulation has lead to significant inter-individual variation. These human specific retrotransposon insertions are often polymorphic at orthologous loci and are highly informative genetic markers (Cordaux and Batzer, 2009 ).

Genomic instability:

Retrotransposon can generate genomic instability in many ways. Insertion mutagenesis causes genetic instability via transposons inserting into protein coding or regulatory regions, therefore potentially influencing genetic evolution. There are a number of inheritable diseases caused by de novo L1, Alu and SVA insertions such as haemophilia, cycstic fibrosis, apert syndrome etc. It is suspected ~ 0.3% mutations are due to non-LTR retrotransposons and they seem to be biased towards the X chromosome (Belancio et al, 2008).

DNA double stranded breaks occur when a DNA undergoes "cut and paste" mechanism transpositions and as a result leads genomic instability but to which extent is unknown. DNA DSBs are generated by L1 ORF2 proteins which have endonuclease activity (Gasior et al, 2006). DSB can lead to considerable genomic instability as they are highly mutagenic and prone to recombination. Non-LTR retrotransposons also have the ability to generate microsatellites at different loci in the genome. Microsatellites have simple replicative sequences called homopolymeric tracts which are susceptible to mutation by nucleotide substitution and replication slippage (Cordaux and Batzer, 2009). They tend to be found in Alu elements. Alu elements have also been shown to undergo gene conversion which is the non-reciprocal transfer of information between homologous sequences (Roy et al, 2006). As Alu elements make up more than 10% of the human genome, gene conversion may have a significant impact on the nucleotide diversity of our genome. All these events cause genomic instability which leads to potential change in the human genome and therefore having an effect on evolution.

Genomic Rearrangement:

Genomic rearrangement caused by transposition occurs in 3 main ways: insertion mediated deletion, ectopic recombination and transduction of flanking sequences. The insertion of L1 and Alu elements sometimes lead to subsequent deletion of adjacent sequence to the insertion. The deletion is usually sorter and has been shown to occur naturally when L1 and Alu elements are inserted (2% and 3% respectively) (Callinan et al, 2005). Therefore could lead to deletion of important sequences, for example pyruvate dehydrogenase complex deficiency is the result of L1-insertion mediated deletion on component X gene (Mine et al, 2007). Ectopic recombination is the recombination between non-allelic homologous elements which occurs after insertion due to the high copy rate (Cordaux and Batzer, 2009). This can lead to deletion, insertion or even inversion. For many years it has been demonstrated that Alu recombination mediated deletions occurs in humans and has been responsible for more than 70 cases of cancer and genetic disorders (Callinan and batzer, 2006). L1 and SVA elements also occasionally carry upstream and downstream flanking genomic sequences as well as copying itself. 3' transduction occurs by the use of alternative polyadenylation signal found on downstream flanking sequences, while 5' transduction occurs when a promoter is used to initiate transcription that is upstream to the retrotransposon. Transduction mediated gene transfer occurs via this process.

Epigenetic regulation of genes and VDJ recombination:

As explained earlier, overtime many transposons have been "silenced", in other words lost the ability to transpose. This is due to the fact that transposons have overtime developed an epigenetic mechanism prevents the expression and mobility of tranposable elements. The ability to silence themselves has also given DNA transposons and retrotransposons the capacity to regulate nearby genes. B Mcclintok first noticed this in maize, as the insertion of non automous elements affected expression. McClintok also found that by adding autonomous and non-autonomous elements you could turn on and off genes respectively. (Slotkin and Martienssen, 2007). As L1, Alu and SVA elements are often found within or near genes, the formation of heterochromatin can express or repress the transcription of adjacent genes (Cordaux and Batzer, 2009). It is believed that the ability if humans to produce a massive diversity of antibodies originated from transposons. As the epigenetic mechanism that controls VDJ recombination is very similar to that of TE regulation. However, it is still unknown whether the origin contributed to epigenetic regulation of rearrangement (Slotkin and Martienssen, 2007).

Gene Therapy:

Gene therapy has been shown to be effective in treating patient with a wide range of diseases. Effective gene therapy requires a number of qualities; a strong delivery system to target cell, long term gene expression and minimal risk of secondary effects. Over time gene therapy has made many advances but there have been some set backs and issued raised about the safety of gene therapy. The capacity of DNA Transposons move directly as DNA is a very appealing tool for gene therapy. The transposase in trans can act on any DNA sequence that is flanked by terminal repeat sequences, which are normally found on the end of transposons. Therefore, to convert a transposons into a gene delivery tool, a dual system has to be developed consisting of an expression plasmid containing DNA transposase and a donor plasmid containing the DNA that needs to be integrated, flanked by terminal repeats (VandenDriessche et al, 2009). The transposase aids in the excision and reinsertion of the gene of interest by binding to the terminal repeats (fig 7). For effective gene therapy a transposon with efficient activity in mammalian cell was required, which was achieved by the reconstruction of a synthetic active Tc1/mariner-type transposon called sleeping beauty (SB) (VandenDriessche et al, 2009). There are also two other transposon system that have shown potential for gene therapy, Tol2 and piggyback (PB).

SB was reconstructed from ancestral elements from salmon fish and is probably become inactive during evolution (Dupuy et al, 2006). SB showed ten folds the activity of any other transposon though suffered from overproduction inhibition, which meant activity decreased if there was excess transposase. (VandenDriessche et al, 2009). The renaissance of the SB had many implications in the genetic world but its activity was still not robust enough for gene therapy. A new generation of transposaes have been engineered by the use of high throughout transposase screening with DNA shuffling strategy, which produced a library of mutant transpoasase gene. By combining the 6 most hyperactive SB, a transposase called SB100X was created which has 100 times more activity than the orignal SB (Mates et al, 2009). However its mechanism is not well understood.

Tol2 is another transposon which is naturally occurring and acts in a wide variety of vertebrates including humans (Kawakami et al, 2007). This transposon has the ability to transfer genes upto 11kb without loss of transposition activity (VandenDriessche et al, 2009). Therefore has the potential to carry fairly large size DNA inserts and does not suffer from the overproduction inhibition (Wu et al, 2006). The advantage of Tol2 is that it creates a single copy insertion without causing gross rearrangements around the insertion site (VandenDriessche et al, 2009). However it has relatively low transposition activity compared to PB and Hyperactive SB system.

PB was initially recognized when it underwent translocation from its host insect to the baculovirus genome and has recently also shown the ability to transpose in humans and mice, which was discovered by Fraser MJ, cited by VandenDriessche, 2009. PB is more active than normal SB and Tol2 but less active compared to SB100X. The activity of PB can be increase by codon-optimization strategies (Cadinnanos and bradley, 2007). Another possibility is using the same method as used in SB to increase PB's activity.

The use of transposons as gene therapy delivery tools may help overcome the manufacturing and regulatory difficulties faced in viral vectors. Nonviral vectors are also likely to be less immunogenic than viral vectors. The use of SB and PB can also allow the integration of large transgenes up to 14kb (VandenDriessche et al, 2009). However it has been shown that the larger the gene, the less efficient transposition is. Therefore, by flanking the transgenes by 2 complete SB transposons the efficiencies is greatly improved (Zayad et al, 2004). All 3 transposon systems encompasses different properties and depending on the aim of gene therapy, either one would be chosen. So far SB and PB have shown great potential, which has lead to the first-in-man gene therapy trial that uses transposons in 2008 (singh et al, 2007)

The use of transposons to insert genes of interest into the genome, along with stable integration is a very attractive prospect. On the other hand, the drawback to the idea is that it may lead to insertional oncogenesis and genotoxicity. The use of SB shows no integration bias as it random, but there is still a genotoxic risk as intragenic integration may occur. To be able to use transposons for gene therapy effectively a transposon needs to be designed that can perform targeted transposition into the human genome safely. According to VandenDriessche et al, 2009, this could be possible by "engineering chimeric transposases fused to heterologous site specific DNA-binding domains". An issue that could arise is the continuous expression of transposons causing uncontrollable transposition, adding to genotoxic risk. All these issues need to be taken into account and resolved prior to the use of transposons in gene therapy.


The ability to analyse transposons have given us a greater understanding of the roles transposons have in evolution and their biology, however what we know is still the tip of the iceburg. Although many mechanism of transposons are well understood, there are still a number of mechanism yet to be understood. The development of new computational analytic technique that combines different methods should help uncover new information that will help elucidate our understanding of transposons. I believe the use of transposons as non-viral vector in gene therapy will be of great clinical significance, once some of the issues I mentioned above are addressed. Over time TE's have been accumulating and make up 45% of the human genome. Transposons have had an enormous impact on the human genome, even though it may be indirect as they encompass the ability to cause genomic instability, gene rearrangement and affect gene expression. The discovery of transposons has highlighted a whole new dimension of genetic as it shows that DNA can be mobile. Further research into the area will provide crucial evidence about human evolution and how transposons can be manipulated for clinical use.