Comparative Genomic Analysis Of Asteraceae Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Little is known about the origins of spliceosomal introns, the relative rates and mechanisms of intron loss and gain, or the role of intron changes in gene evolution. In the present study, intron size changes were scored and the conservation of intron positions was determined for 144 single-copy orthologous genes from thirteen species belonging to four major subfamilies of the composite family of flowering plants. The gene markers, known as the conserved orthologous set (COS), were identified from express sequence tags (ESTs) in the Compositae Genome Project (CGP) database ( Nine intron losses and no intron gains were found in the Asteraceae, and a single intron gain was found in Arabidopsis thaliana, which was used as an outgroup. Intron loss rates were not related to the degree of diversification within the four composite subfamilies. The intron deletions were precise, without residual intronic nucleotides or lost exonic sequences, and most deletions were close to the 3' end of a gene. The missing introns were significantly shorter than average intron length, and occurred primarily within highly expressed genes. These findings uphold the mechanism of intron loss through reverse transcription and demonstrate that putative intron-exon boundaries determined from multiple alignments of conserved orthologous sequences over extensive evolutionary distances are a useful predictor of introns.

Two major evolutionary processes have resulted in rapid changes in the C-value - the amount of DNA contained in the haploid nucleus - in different groups of organisms. Large scale duplications, such as segmental duplications, aneuploidy, or polyploidy, increased C-values in many species, relative to diploid ancestors. This process was sometimes followed by genome streamlining, in which some duplicated regions of the genome became non-functional and were lost. The second process that impacted genome size is transposon invasion and expansion. Because introns are less constrained and have higher rates of evolution than the coding sequences of a genome, intron changes might be expected to be correlated with various species characteristics, particularly if transposons drive genome expansion, and streamlining in polyploids drives genome reduction. This hypothesis was tested by comparing intron lengths in the diploid species, Lactuca sativa (lettuce, 2n = 18), a likely tetraploid, Helianthus annuus (sunflower, 2n = 34), and other composite species that represent the diverse characteristics of the family. Intron length polymorphisms were independent of genome size, ploidy level, population size, and generation time, but were closely correlated with phylogenetic relatedness. The results also indicated that Helianthus annuus is not an ancient polyploid, but the result of chromosome break, fusion, and rearrangement events. Overall, results of the research reported in this dissertation demonstrated that introns provide a useful tool for studying evolutionary processes in closely related species.


I am deeply grateful to my advisor, Dr. Richard Kesseli, whose encouragement, guidance, and support throughout the research and writing enabled me to develop a deeper understanding of the subject and made this dissertation possible. I would also like to thank to the past and present members of my committee, Kenneth Kleene, Adán Colón-Carmona, Linda Huang, and Ying Tan, for providing me with information resources and valuable advice. I especially appreciate Steven Ackerman, who helped me in solve many technical problems.

I am indebted to James Allen for growing the experimental plants, and to David Weisman for his help in bioinformatics and computer techniques. I am obliged to many colleagues in the Compositae Genome Project, who provided direct or indirect support. I also thank Maria Mahoney, Bonnie Campbell, Marcia Kazmierczak, Alexa MacPherson, Charles King, and Laney Digiovanni for assistance and support.

Many thanks to past and present members of Dr. Kesseli's lab, including Lee Timms, Rony Barbara, Jonna Grimsby, Melinda Gammon, Dina Tsirelson, Stuart Morey, Trudi Gulick, Anastasia Mozharova, Tomas Zavada, Mike Williams, and Selina Imboywa. I particularly thank the undergraduates who worked with me during the past six years: Timothy Menz, Tam Huynh, Brian Rothschild, Tiffany Ynosencio, Lan Huynh, Eric Reed and Man Lok Yu. This dissertation would not have been possible without their unremitting efforts.

Finally, I thank my parents, who have always provided spiritual, moral, financial, and material support. A big "Thank you!" to my sister, Ming-Ching Chang, and brother-in-law, Gennan Chen, for supporting me throughout my years in America and for taking care of my everyday problems. I also thank my little angel and princess, Winnie Chang, for giving her father the morale boost and motivation to complete this dissertation. A very special thank you to my beloved soul mate, Ling-Yi Lin, for accompanying me through these years, encouraging me to pursue my academic goals, and supporting every step of my life and work. I would not have been able to achieve this dissertation without her.




The 1977 discovery of the spliceosomal introns in eukaryotic genes, and the subsequent description of corresponding splicing mechanisms that put exons together, were among the most fundamental, amazing, and puzzling discoveries in biology (Chow, Gelinas, Broker, & Roberts, 1977; Berget, Moore, & Sharp, 1977). Spliceosomal introns (group III) are sequences that interrupt nuclear coding sequences in eukaryotes. They are removed from RNA transcripts by a complicated protein-RNA complex, called the spliceosome.

In 1978, American biochemist Walter Gilbert coined the phrase "introns and exons" while he was predicting the role of introns in gene evolution. According to Gilbert, introns play a key role in exon shuffling and alternative splicing. He states that "the notion of the cistron must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger - which I suggest we call introns (for intragenic regions) - alternating with regions which will be expressed - exons." Introns occupy a large proportion of the non-coding genomic portion of DNA. Due to the transcription of genomic DNA, a precursor mRNA (pre-mRNA) is generated.

The pre-mRNA is comprised of four major parts: the 5' un-translated region (5'-UTR), the exon that is protein coding, the intron that is non-coding, and the 3' unranslated region (3'-UTR). The 5' and 3'-UTRs are extensions of the introns; however, the processing of these 5' and 3'-UTRs differs from the intron that is located between the two exons that are protein coding. This intron is referred to as the in-frame intron. The in-frame intron is lengthy and was once thought to be a junk fragment during the course of transcription (Lin, Miller, & Ying, 2006). However, it was determined that the distribution of intron lengths is utilized for large-scale sequencing (Irimia & Roy, 2008).

The sequences of the spliceosomal introns are quasi-random and in general, they do not contain open reading frames (ORFs) (Roy & Gilbert, 2006). On the contrary, group I and group II introns from a variety of sources have long ORFs. Through the process of reverse transcription, these ORFs encode proteins to facilitate the propagation of introns to the sites that are currently devoid of introns. Spliceosomes are the macromolecular enzymes that remove the introns from the primary transcripts. Spliceosomes can remove all introns from a primary transcript, resulting in the formation of either mRNA that is fully spliced or various mRNAs from a gene (Garcia-Blanco, 2003). The spliceosomes are composed of five RNAs, and the length of the spliceosomes varies with the species. The process of removing the spliceosomal introns is associated with several steps of the transcription processes. Even though there are considerable differences between the spliceosomal introns and other introns, some similarities exist, indicating the possibility of an evolutionary relationship. The number of spliceosomal introns varies greatly within the eukaryotic species.

The causes and the timing of the spliceosomes' evolution are interesting in the genomic evolution. Less is known about the mechanisms for the loss or gain of introns, though researchers were able to trace the origins for a few introns (Roy & Gilbert, 2006). Previously, people believed that introns have no specific function and treated them as selfish DNA , until intronic functions in the cell have been progressively uncovered. With the recent discoveries of non-coding RNAs demonstrating a variety of functions, it can be assumed that introns may hold more functional regions to be discovered. One example of such a discovery is the realization that U14 small nuclear RNA (snRNA) is encoded in an intron of the hsc70 heat shock gene in mouse (Liu & Maxwell, 1990). Consequently, introns bear regulatory elements and possess functions that play important roles in genome.

Trans-splicing, two different mRNA copies recombine via non-coding intron-like sequences to form a single mRNA, plays a pivot role in the protein diversity in eukaryotes (Fedorova & Fedorov, 2003). It has been observed that, in those chromosomal regions where there is less frequency for meiotic crossing over, introns are supposed to be longer and thus they increase the recombination rate between the exons (Comeron & Kreitman, 2000). These introns are providers for the exon shuffling and this procedure is extremely important in the evolutionary process. It has been known that introns facilitate the export of mRNA from the nucleus to the cytoplasm as well as stabilize the mRNA in the cytoplasm (Fedorova & Fedorov, 2003).

Introns mediate gene regulation via alternative splicing. Long-conserved sequences are not required for these regulatory roles. This causes introns to evolve freely in a faster way than exons, which, in turn, is a feature of the introns that makes them an important tool for the evolutionary studies. Such a character is extremely useful in polymorphism studies and nuclear DNA marker development. A primary difference between splicing in plants versus splicing in animals is recognition of the site of splicing. In comparison with exons, intronic sequences are significantly lengthier and comprise more adenines and thymines in plants (Morello & Breviario, 2008). Additionally, the expression of genes response to environmental stimulus is under the control of intronic regulatory elements during plant growth and contour development (Morello & Breviario, 2008). The splicing requirements for dicots differ from those of plant monocots, and understanding the requirements to work on plant species is important in that it gives a detailed view on the complexity in plants.

The mechanisms by which introns are inserted and deleted from gene loci are not well understood. Intron density differs greatly among organisms, and the evolutionary history of spliceosomal introns had entered into endless arguments in eukaryotic evolution (Roy & Gilbert, 2006).

Origin of introns

Several questions were raised with regard to the function and origin of introns after the discovery of these sequences. According to Gilbert (1978), introns had their existence in ancestral genes, and their utility lies in assembling the first genes. However, despite some evidence for this hypothesis, such as the relationship between the exon structure and the domain structure of the proteins, the argument has no theoretical basis (Demetrius, 1988). There might be a split in the structure of the ancestral genes and the relics from the primordial assembly of genes that might have been preserved in the guise of the exons through recombination in introns (Doolittle, 1978).

The origin of the spliceosomal introns is the longest unsolved mystery of molecular biology. After an extensive study over several years, researchers were able to identify the origin of introns in only two cases, including a short interspersed nucleotide element, or SINE, insertion that resulted in a new intron embedded in the exonic sequence of rice CatA, which is a catalase. In this case, the intron was derived from the ancestral genome of rice after its evolutionary divergence from the other ancestral cereals such as wheat, barley, and oat (Iwamoto, Maekawwa, Saito, Higo, & Higo K, 1998; Roy, 2004).

In the other case where the origin of introns was observed, two midge globin genes acquired introns via the gene conversion with an intron containing paralog. The intron that was found in the globin gene is interpreted as evidence for the three-intron and the four-exon structure of the ancient globin gene (Hankeln, Friedl, Ebersberger, Martin, & Schmidt, 1997; Roy, 2004). However, most of the eukaryotes have multiple introns per gene that require a large number of gains of introns throughout the evolution of the eukaryotes (Roy, 2004). It is assumed that the differences in the sizes of population and the rate of mutations correlate with the density distributions of introns among different species. Plant organelle genomes (i.e., chloroplasts and mitochondria) often contain a great number of introns. This finding matches Lynch's hypothesis (2002) that introns were only able to reproduce after multicellular organisms' emergence, followed by a reduction in effective population size, due to the required threshold of mutation rate and population size.

It has been observed from the complete genomic sequences of diverse phylogenetic groups that there is an increase in the complexity of genomes from simple prokaryotes to multicellular eukaryotes. These sequences also reveal an abrupt increase in the number of spliceosomal introns. These modifications are passive emergences in response to both the reduction in the sizes of populations and an increase in the size of the organism (Lynch & Conery, 2003). The evolution of introns comprises two ingredients: one neutral and the other selection driven. The neutral background ingredient may correspond to the balanced evolution mode, in which rates of intron gain and loss are accordant. In the selection driven ingredient, the evolution of introns is evident by a significant increase in the gain and a significant decrease in the loss, which results in genes that are evolutionarily conserved having a greater density of introns than the genes which evolve at a faster pace (Carmel, Rogozin, Wolf, & Eugene, 2007).

Research reports observe that genes with high expression evolve at slower rates, and the level of genes' expression primarily determines the evolution of the rate of both coding and non-coding DNA sequence. The number of translational events experienced by a gene is the determinant of expression and evolutionary rate (Drummond, Raval, & Wilke, 2006). It has been shown that introns affect the expression of various genes at various levels, which includes mRNA export, stability, and efficiency of translation (Nott, Meislin, & Moore, 2003). Introns and the spliceosomes for the process of splicing are present in the ancestors. Splicing is a fundamental aspect of all the eukaryotes and may have evolved before the last ancestor of the living eukaryotes. There might be a considerable period between the first eukaryote and the eukaryote ancestor, as it is not possible to ascertain on the origin of spliceosome (Collins & Penny, 2005).

Theories of intron origin

Little is known about the genomic architecture of introns, though they proliferate widely in living organisms. Evidence which has recently emerged through large-scale genomic sequencing projects and functional analysis of mRNA-processing events supports the idea that spliceosomal introns were not only present in early eukaryotes but also diverged into a minimum of two eukaryotic classes in the early stages of evolution (Lynch, 2002). The modern debate centers on important issues of evolution, such as the introns-early versus introns-late argument, rates of intron gain, intron loss and parallel gain, the 'mini gene' hypothesis, and the 'protein-splice site' hypothesis (Nguyen, Yoshihama, & Kenmochi, 2005).

The introns early theory proposes that ancient introns were present in the universal common ancestor (LUCA) of both prokaryotes and eukaryotes (Gilbert, 1978). It is also postulated that the premiere genetic elements encoded small domains, similar in length to typical modern exons, which recombined to facilitate protein evolution via non-coding intronic sequences present in some of these elements (Roy, 2003).

During subsequent evolutionary history, introns underwent divergent evolutionary courses in these different lineages: in prokaryote lineages they were erased, but in eukaryotes introns were retained, accompanied with the spliceosome (Gilbert, 1978). The loss of introns in prokaryotes has been described as "genome streamlining" (Roy, 2003). According to Roy's streamlining hypothesis, efficiency in replication is the goal of such pressure in the evolution of prokaryotes, and thus non-essential parts of the genomes, such as introns, were eliminated.

The introns late theory assumes that introns evolved from self-splicing introns in remotely antiquated eukaryotes. These group II introns were present in the mitochondrial organelles of endosymbionts, and invaded previously-undivided genes and genomes without introns. After that, the spliceosome have appeared in evolution to splice out non-coding introns from the transcript (Cavalier-Smith, 1991). This theory is based on functional and structural identities between self-splicing group II introns and spliceosomal introns. In both types of introns, the 5' end is invading an adenine near the 3' end and subsequently forming a lasso structure to be spliced out (Newman, 1997).

Further, self-splicing group II introns seem to be observed merely in eubacteria (Bonen & Vogel, 2001), as well as the eukaryotic organellar genomes thought to be derived from eubacteria (e.g., mitochondrial genomes) (Gray, 1999). In addition, genes in the organelles had been transferred to the nucleus in a large scale (Gray, Burger, & Lang, 1999), which may have resulted in the eukaryote nucleus being invaded by group II introns.

The introns early theory proposed constant loss of introns along the eukaryotic evolution. Though the research cannot provide a definitive answer, Nguyen, Yoshihama, and Kenmochi (2005) report that intron densities in eukaryotes do not well incline to decrease. During the period of intron evolution between the last common ancestor of the eukaryotes and the crown ancestor there is evidence of both a decrease and an increase in the number of introns. There is much debate between the 'introns early' theory and the 'introns late' theory even though strong support was provided in favor of the intron late. For example, there has been no observed decrease in the density of introns in seven eukaryotes since the crown ancestor using a novel maximum likelihood method (Nguyen, Yoshihama, & Kenmochi, 2005).

There is evidence that there is no conservation of intron position in the 25 cytoplasmic ribosomal protein (CRP) genes of the archeal region of the mitochondrial genes of bacterial origin. These are supposed to have diverged at the LUCA (Nguyen, Yoshihama, & Kenmochi, 2006). Intron positions have been broadly used for evolutionary studies because of their conservation throughout evolution. The vital components to the translational machinery for cellular life are ribosomes and they are conserved to a high extent throughout the evolutionary process. This has allowed researchers to compare ribosomal proteins across distantly divergent species.

The mitochondrial ribosomal proteins (MRP) are reported as having evolved from bacteria, as there is a considerable homology between the MRP and the bacterial proteins, while the CRP have evolved independently. MRP genes are passed to the nuclear genome, which contains spliceosomal introns. In this situation, it is possible to determine the existence of the ancestral spliceosomal introns through comparisons of MRP and CRP intron-exon structures.

Yoshihama, Nakao, Nguyen, and Kenmochi (2006) support the intron-late theory, as they observed no clarity in conservation of the intron position between the CRP and MRP genes. This indicates that the spliceosomal introns were absent in the last common ancestor of genes coding for the MRP and CRP.

Palmer and Langsdon (1991) have reported evidence of restricted phylogenetic intron distribution. They support the view that introns are inserted late in eukaryotic evolution and that there is no role for exon shuffling in primordial gene assembly. The construction of the first genes is associated with exon shuffling, and the intron position is influenced by different phylogenetic distribution dynamics.

Statistical analysis performed by de Souza, Long, Klein, Roy, Lin, and Gilbert (1998) support the intron/exon structure of the genes as a consequence of the assemblage of first genes through exon shuffling during early stages of evolution. Some experiments have suggested that introns have gained during the evolution of eukaryotes at the proto-splice sites. The conserved sequence, which flanks the introns, may have also been used as a site for the intron gain during evolution (Dibb & Newman, 1989).

One clear example of spliceosomal intron insertion is observed in the U2 and U6 small nuclear genes of certain species of yeast. However, the coding sequences that flank these introns are random and the example suggests no proto-splice site for actual intron insertion. Evidence of an excess of the symmetric exons - whether existing in the modern or the ancient conserved genes - supports the key role of exon shuffling in the early and late evolutionary stages prior to the divergence of eukaryotes and prokaryotes (Long, de Souza, Rosenberg, & Gilbert, 1998). The intron-exon structures have an intricate history, and evolved along different stages. The ancient introns prefer to be in protein module boundaries, as they are involved in exon shuffling. Modern introns are inserted at proto-splice sites.

Fedorov, Cao, Saxonov, de Souza, Roy, and Gilbert (2001) observed that both ancient and modern introns are distributed in the ancient conserved regions and the non-ancient conserved samples of the genes. An excess of phase 0 intron positions in the boundary region of the modules of the ancient proteins, common to eukaryotes and prokaryotes, supports the concept that introns are used in constructing ancient genes through exon shuffling of the modules during early evolution (Fedorov, Cao, Roy, & Gilbert, 2003).

It has come to an agreement that spliceosomal introns and the spliceosome subsisted in the most recent common ancestor of living eukaryotes already, and that little of the primigenial exon/intron boundary distribution remains, due to rapid intron turnover (Wolf, Kondrashov, & Koonin, 2001). Different lineages also reveal disparate intron loss and gain patterns; intron sliding is rare and the majority disclose simply one base relocation (Rogozin, Lyons-Weiler, & Koonin, 2000; Sakharkar, Tan, & de Souza, 2001).

A new theory of intron origin emerged in 1998 by Jeffares, Poole, and Penny. Their 'introns first' posits that both of introns and the spliceosome were existing in the RNA world already. A potential RNA must be so essential to metabolism that not only do various systems rely on it, but its existence could not be substituted by proteins. This RNA is also both catalytic and prevalent in different organisms, as a result of its conception from being remnants of the RNA world.

Among manifold RNA molecules, small nucleolar RNAs (snoRNAs) are necessary to guide chemical modifications of ribosomal RNAs (rRNAs). As a result, one must conclude that these antedate the origin of genetically encoded proteins. Because snoRNA genes are often encoded by protein introns which respond to stimulus (e.g., heat-shock proteins) and ribosome synthesis, these introns must at least precede the protein-coding exons that surround them. It is possible that one or more mechanisms for removing introns were forming at the time.

The origin of spliceosomes in RNA is consistent with the assembly of ribonucleoproteins (RNPs), due to it being crucial to snoRNA maturation, which plays an important role in the processing of rRNAs from spliced introns. To date, proponents of all hypotheses regarding the origin of introns lack sufficient evidence to refute all the other theories, and the controversy continues.

Intron function

It is predicted that spliceosomal introns have no special or general function, and their non-existence in prokaryotes and massive losses in the eukaryotic lineages suggests that introns have no essential function (Roy & Irimia, 2008). The presence of introns in eukaryotes has several shortcomings, including a waste of time and energy during gene expression while transcribing long intronic fragments of pre-mRNA molecules, and possible errors in ordinary splicing due to long introns containing abundant false splicing sites (pseudo-exons). Hence, introns must be endowed with some functional gains to counteract these drawbacks.

Several important functions of introns have been unveiled that counter the perception of introns as 'selfish' or non-functional. Roy and Irimia (2008), report that a very short sequence of introns is active in genomic stability, chromatin structure and also in promoting the process of recombination. In addition, the evolution of the spliceosomal introns by the removal of restrictions on internal genome sequences might allow introns to drift with minimal constraints. It is reasonable to predict that any beneficial mutational change might be retained for its positive selection value.

Information within intronic sequences has been seen as a better predictor of molecular evolution than that of protein. For example, alternative splicing increases the diversity of proteins by excising different sets of introns from a single gene to yield diversified protein products. Although small, the sequences involved in alternative splicing bear many of information signatures, such as highly conserved base distribution. Regulated splicing can also serve some biological function, which better explains intron retention in reduced eukaryotic genes. The last common ancestor of modern eukaryotes had complex spliceosomes and a large number of introns; those introns might have degenerate sequences (Roy & Irimia, 2008).

Irimia, Rukov, Penny, and Roy (2007) state that the ancient genes of the modern organisms have high levels of alternative splicing, and these genes might have many introns in their common ancestors, which is an important requirement for alternative splicing. Interpretation based on the experimental results of Irimia, Rukov, Penny, and Roy (2007) reveal that alternative splicing might have appeared before the rise of multicellular organisms, and that alternative splicing might have played an important role in the biological functions of ancient unicellular organisms.

A previous study identified that, in yeast, introns' secondary structure coordinates splice site pairing and thus avoids incorrect alternative splicing by promoting the involvement of internal exons through the splicing process. Enormously long introns in vertebrate organisms may require these secondary structures to accomplish splicing and to boost the evolution of alternative splicing (Howe & Ares, 1997). There are five common alternative splicing modes, including 5'-splice sites, alternative 3'-splice sites, exon skipping, mutually exclusive exons, and retained introns (for reviews, see Black 2003).

Introns have regulatory sequences and regulate ribosomal protein transcription in a highly coordinated way, and it has been reported that intronic sequences might indirectly participate in the transcription of some ribosomal proteins that are over expressed (Zhang, Vingron, & Ropcke, 2008). Liu, He, Amasino, and Chen (2004), on the basis of their research in the plant species Arabidopsis, report that transposable elements (TEs) inserted in the intron has a role in evolution and gene expression.

Ion channel gradients and membrane properties in hippocampal neurons were found to be conducted by introns (Bell et al., 2008; Tsirigos & Rigoutsos, 2008). The second intron of the human nestin gene has been reported to bear an evolutionarily conserved region which manipulates expression of genes associated with nervous system (CNS) progenitor cells as well as to early neural crest cells (Lothian & Lendahl, 1997).

Likewise, in order to express the human apolipoprotein B gene in liver, this gene's second intron is crucial (Brooks et al., 1994). Cenik, Derti, Mellor, Berriz, and Roth (2010) also reveal that introns within the 5'UTR of human genes enhance the expression of some genes in a length-dependent manner, where transcriptional regulatory function is performed by some intron sequence motifs.

Positive regulatory sequences, or 'enhancers,' along with negative regulatory sequences, or 'repressors,' have also been reported in the introns of many human genes. Introns play a regulatory role in processing primary transcription by the modulation of the mRNA splicing, or influence the stability of the mRNA through RNA-RNA, RNA-DNA, or RNA-Protein interactions (Cooper, 1999).

Introns contain several types of non-coding, but functional RNA sequences. The snoRNAs and microRNAs (miRNAs) play a key role in a range of processes including ribosome biogenesis and gene regulation. SnoRNAs are located inside introns, and are produced during post-splicing processing of intronic RNA (Hüttenhofer, Brosius, & Bachellerie, 2002). The snoRNAs may also modify the other RNA targets including snRNAs of the spliceosome, and they also function as a regulator of the alternative splicing of mRNA (Hoeppner, Simon, White, Jeffares, & Poole, 2009), and in guiding pseudouridylation and methylation processes in pre-rRNA through complementary guide sequence pairing with rRNAs (Maden & Hughes, 1997).

MiRNA is frequently found inside introns (Bartel, 2004). They are short RNA molecules and capable of "silencing" genes by binding complementary sequences in the 3'UTR of one or more target mRNAs. In vertebrates, 40-70% of miRNAs are expressed from introns of both coding and non-coding transcripts. Intronic miRNAs are less common in the protein coding genes of worms and flies, at 15% and 39%, respectively (Griffiths-Jones, Saini, van Dongen, & Enright, 2008).

Hoeppner, Simon, White, Jeffares, and Poole (2009) report that the intronic snoRNAs and miRNAs are more likely and significantly stable than the intergeneric mRNAs. Less than half of the snoRNAs and miRNAs remain in the same location throughout evolution, a fact that may confer an advantage. Some miRNAs are known to arise from the insertion of a transposon into the functional intronic sequence and proliferate through the machinery of the host gene. MiRNAs resulted from this mechanism were reported to affect various biological systems in a vast number of life-forms (Ying, 2008). However, spliceosomal introns may have many functional elements yet undiscovered.

Intron gain and loss as a phylogenetic tool

The phylogenetic tools that serve to perform systematic work in all species currently consist of chloroplast DNA (cpDNA) sequences, which are conserved in all plants and rarely recombine (Kim & Jansen, 1995), and the internal transcribed spacer (ITS), which is the DNA sequence between ribosomal RNA (rRNA) genes (Baldwin, 1992). It has been commonly known that cpDNA non-coding fragments such as intergenic spacers and introns evolve more quickly than coding regions. These non-coding regions are less constrained to specific functions and could have more indels accumulations and nucleotide substitutions than coding sequences (Clegg, Gaut, Learn, & Morton, 1994).

Due to this distinction, it is understandable why non-coding cpDNA regions (e.g., rps16, rpoC1) are widely used for phylogenetic studies of recently-diverged taxa. Chloroplast genes, such as rbcL, are also utilized to infer plant phylogenies. However, it has been reported that some rbcL genes are too limited to resolve relationships between closely-related species (Gielly & Taberlet, 1994). The ITS sequences are essential to produce mature rRNA molecules, due to their self-excision from the RNA transcript. Due to their striking variations and simple amplification, these have been widely sequenced and strongly used in phylogenetic methodology.

A record of the evolutionary history of a species may be saved in intron sequences and positions, and therefore likely contain valuable phylogenetic information. Non-coding DNA segments, such as introns, lack functional constraints. Therefore, the patterns of intron insertion and deletion may reflect evolutionary patterns within and among species. Moreover, obtaining intron sequence data sets from map-poor organisms is easily facilitated with PCR-based methods, and intron positions can be easily determined using various sequence alignment tools and software. Thus, introns may provide a useful complement to cpDNA- and ITS-based phylogenies among poorly studied groups.

Until recently, there have been two primary phylogenetic approaches serving to exploit introns as phylogenetic markers in distinct evolutionary facets. Intron sequences, which evolve faster than coding sequences, have been used to resolve relationships between related species (Slade, Moritz, Heideman, & Hale, 1993). In addition, it seems that intron sequences permit sequence mutations and indels to fair easily, so potentially intron sequences that evolve neutrally can be seen as a key tool toward the resolution of phylogenetic problems. Protein-coding sequences may experience too little change to generate a sufficient signal, and intronic sequences provide a relatively simple alternative to recent evolutionary approaches.

Seeing introns as slowly evolving characters provides useful information even though they are scarce (Irimia & Roy, 2008). Introns maintain their positions in most lineages over long evolutionary periods. A string of intron phases stands for a structure which holds all important information about organization and evolution of genes. Like other chromosomal rearrangements, such as inversions and translocations, intron gains and losses are rare. The rates of intron gain and loss has a substantial variation through the evolution of eukaryotes (Roy & Penny, 2006). Roy and Gilbert (2005) report that the more common model for intron deletion is through reverse transcription of mRNA to cDNA copies followed by inserting back into the genome. Multiple contiguous introns on a gene are frequently excised during the process. Therefore, intron positions preserve much information pertaining to genome structure and deep evolutionary history, and provide a useful phylogenetic tool for evaluating evolutionary relationships.

For instance, Venkatesh, Ning, and Brenner (1999) used some intron loss and gain patterns to group species into 'clades'. Intron sliding, albeit containing very few shifts in bases (as mentioned previously), is another strategy that can also be used for phylogenetic analysis. Large scale alignments in various lineages are truly rare and thus represent informative genomic changes. However, recurrent intron loss and/or gain at the same position, which has been reported to have taken place in some intron loss/gain hot spots (Krzywinski & Besansky, 2002), may mislead analysis and may result in imprecise phylogenetic classification.


The present investigation was performed with the family Asteraceae as its primary subject. Asteraceae is the most diverse and largest family of the flowering plants, and contains approximately 1620 genera and 23,600 species. Roughly one in every 10 species of flowering plants are members of Asteraceae. This family is characterized by a compound inflorescence that gives off the appearance of a single composite flower. Many species of the family are domesticated and over 40 species are important to people, including food (lettuce, chichory), oil (sunflower, safflower), medicinal (echinacea, chamomile), and ornamental crops (chrysanthemum, dahlia, zinnia and marigold) (Michelmore et al., 2003).

Eight of the twenty most invasive plants in the U.S. are composites (e.g., thistles, knapweeds, and dandelions), causing an estimated annual cost of more than $120 billion (Pimentel, Zuniga, & Morrison, 2005). The vast majority of the Asteraceae members are herbs, though some tropical variations are shrubs and trees (Stevens, 2001). The family is found in diverse habitats, and due to this diversity, Asteraceae has led to the production of a cosmopolitan array of taxa. Although Asteraceae is a relatively young family, originating in the mid Eocene epoch, it has undergone a remarkable diversification during the last 40 million years (Bowers, Chapman, Rong, & Paterson, 2003).

The adaptive nature and the size of the Asteraceae family is an important factor that has stimulated research into the systematics and evolution. However, with reference to molecular characterization, this family has lagged behind when compared to others (Kesseli & Michelmore, 1997). Until 2000, Asteraceae was poorly-studied and had few genetic resources. Since the launch of the Composite Genome Project (CGP; in 2000, over a million expressed sequence tag (EST) sequences from 18 species across over 30 tribes in the family have been added to NCBI's database (Laitinen et al., 2005). Evolutionary genomic analyses of the Asteraceae family are now practical by means of such huge EST collections containing more phylogenetically diverse taxa info than any other plant data bank. This is largely due to the advent of new techniques and new markers, a success which has greatly ameliorated the amount of the available sequence data. Asteraceae classification has been recently re-evaluated (Panero & Funk, 2002), and now includes 11 subfamilies and 35 tribes. Funk et al. (2005) have constructed a super tree for the family that provides an illustration of the current thinking about the relationships among the major Asteraceae tribes and subfamilies.

Asteraceae provides some of the best documented examples of the hybrid evolution of plants (Rieseberg, 2003). Based on molecular evidence, the subfamily Barnadesioideae is considered basal to the Asteraceae (Urtubey & Stuessy, 2001) and thus a sister group to the rest of the family. Barnadesioideae, which is monophyletic, and includes a single tribe, the Barnadesieae, contains less than 1% of the species in the family (Funk et al., 2005). The subfamily Asterioideae contains 65% of the family species. Members of the subfamily Cichorioideae show key variations in their morphological and molecular characteristics (Funk et al., 2005). The largest subfamily, Asteroideae, was sorted into three supertribes, Asterodae, Helianthodae, and Senecionodae, by molecular investigations (Robinson, 2005). The other large subfamilies include Mutisioideae and Cardioduadeae, including more than 2000 species each. Studies by Panero and Funk (2008) revealed some novel clades in Asteraceae systematics that include subfamilies Carduoideae, Gochnatioideae, Hecastocleidoideae, Mutisioideae, Pertyoideae, Stifftioideae, and Wunderlichioideae.

Characterization of intron dynamics in the Asteraceae using universal markers

Analysis and comparison of plant genomes may provide insights into the genome evolution of both closely and distantly-related species. Many aspects concerning comparative genomics, systematics, evolutionary biology, and phylogenetics depend on validated sets of orthologus genes, even though it is difficult to establish orthologus markers (Wu, Mueller, Crouzillat, Pétiard, & Tanksley, 2006). Detecting a single-copy orthologous gene has become a great advantage to comparative plant genomics and evolutionary divergence among plant families. To characterize intron changes within the Asteraceae, orthologous genes - genes that have diverged after a speciation event - must be compared. Previous studies demonstrated the usefulness of orthologous gene comparisons for PCR-based identification of conserved syntenies in birds, mammals, and insects (Lyons et al., 1997; Smith et al., 2000; Chambers et al., 2003). Paralogs, by contrast, are duplicated copies within a genome that arise through polyploidization or through tandem duplication.

Whole genome duplication, as well as extensive local duplication and rearrangement are hallmarks of plant evolutionary histories (Adams & Wendel, 2005). To avoid the complication inherent to comparing paralogous copies, the study was confined to a conserved orthologous set (COS), defined as a collection of genes that are evolutionary-conserved and single-copy throughout plant evolution regardless of segmental duplication regions and ploidy. If paralogs are erroneously recognized as orthologs, the genetic map order can be consequently changed between species.

For this reason, it is vital to design DNA markers from single copy and highly conserved genes in comparative mapping studies because they can be used to identify chromosomal rearrangements and the degree of synteny. The present study used "universal" PCR primers designed for conserved regions within exons that flank one or several introns (Chapman, Chang, Weisman, Kesseli, & Burke, 2007). The results indicated that these loci are phylogenetically informative and thus can be used to resolve the evolutionary relationships between taxa within the composite family.

Three underlying features of intron evolution make this strategy feasible when identifying orthologous targets. First, exons are relatively conserved among related species. For example, the exon distributions of orthologous genes and the intron position and phase show a concordance of more than 98% in Sorghum species and rice (Paterson et al., 2009). Second, the presence of introns does not readily change throughout distant evolutionary time and intron positions are proved to be largely conserved in diverse lineages in eukaryotes. Thus, the approximate position of an intron between two consecutive exons can be predicted from related species through detailed genomic maps. Preliminary data shows that the intron-exon boundaries in the Arabidopsis genomic sequences are extremely useful to predict introns in distantly-related Asteraceae (Roy, Fedorov, & Gilbert, 2003). Third, introns evolve faster than exons in the same gene, and are more diverse and polymorphic as well (Guo, Wang, Keightley, & Fan, 2007). Hence, the more beneficial approach to such research would center around the development of COS markers using genes enclosing (potential) intron information. Once developed, these COS markers can be engaged in putative intronic sequence walking, as well as in the linkage of related genomes within the family and with other sequenced model species.

There have recently been many large scale analyses of intron evolution and dynamics among many gene families and species, particularly fully sequenced model species such as Drosophila, Arabidopsis, rice, and fungi (Parsch, 2003; Knowles & McLysaght, 2006; Lin, Zhu, Silva, Gu, & Buell, 2006; Nielsen, Friedman, Birren, Burge, & Galagan, 2004). Bayer and Starr (1998) studied the utility of the chloroplast trnL intron and the trnL/trnF intergenic spacer to resolve phylogenetic relationships among the tribes of Asteraceae, and found many phylogenetically informative indels in the region. However, studies of spliceosomal intron dynamics in the Asteraceae remain extremely rare, due to the lack of whole genome sequencing and nuclear DNA markers. In the present study (Chapter 2 and 3), I attempt to expose part of this enigma with regard to intron-genome size relationships and patterns of intron loss and gain in this family using universal COS markers.