Comparative genomic analysis of the Asteraceae

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The 1977 discovery of the spliceosomal introns in eukaryotic genes, and the subsequent description of corresponding splicing mechanisms that put exons together, were among the most fundamental, amazing, and puzzling discoveries in biology (Chow, Gelinas, Broker, & Roberts, 1977; Berget, Moore, & Sharp, 1977). Spliceosomal introns (group III) are sequences that interrupt nuclear coding sequences in eukaryotes, and are removed from RNA transcripts by a complicated protein-RNA complex, called the spliceosome. However, the mechanisms by which introns are inserted and deleted from gene loci are not well understood. Intron density differs greatly among organisms, and the evolutionary history of spliceosomal introns remains one of the most hotly debated topics in eukaryotic evolution (Roy & Gilbert, 2006).

Origin of Introns

Two main theories, introns early and introns late, have been proposed to account for the origin of introns. The introns early theory proposes that introns were present in the last universal common ancestor (LUCA) of prokaryotes and eukaryotes (Gilbert, 1978). More specifically, it is postulated that the earliest genetic elements encoded small domains, similar in length to typical modern exons, which recombined via non-coding intronic sequences present in some of these elements to facilitate protein evolution (Roy, 2003). During subsequent evolutionary history, the introns suffered different fates in the different lineages: they were lost in prokaryote lineages, but maintained in eukaryotes as introns by the appearance of the spliceosome (Gilbert, 1978). The loss of introns in prokaryotes has been explained as "genome streamlining" (Roy, 2003). According to the streamlining hypothesis, the main pressure in the evolution of prokaryotes had been maximization of the replication rate, resulting in elimination of non-essential parts of the genomes. Introns would not survive under such intense negative selection.

The introns late theory proposes that spliceosomal introns arose in the first eukaryotes from self-splicing introns. These group II introns were present in the mitochondrial organelles of endosymbionts, and invaded previously undivided genes and intron-less genomes, and the spliceosome evolved as a way of removing them (Cavalier-Smith, 1991). The argument for self-splicing introns giving rise to spliceosomal introns and their spliceosomes is based on functional and structural similarities between self-splicing group II introns and spliceosomal introns. In both types of introns, the 5' end becomes bound to an adenine near the 3' end, forming a lasso structure that is excised (Newman, 1997). Furthermore, the group II introns appear to be phylogenetically limited to eubacteria (Bonen & Vogel, 2001), and the organellar genomes of eukaryotes, such as mitochondrial genomes, are thought to share common ancestors with various eubacteria (Gray, 1999). Genes in the organelles had been transferred to the nucleus in a large scale (Gray, Burger, & Lang, 1999), which may have resulted in group II intron-like elements invading into the eukaryote nucleus.

The development of high-throughput genomics in the late 1990s marked a turning point in the debate over the origin of introns, because of the greatly enhanced analytical power and comparative analyses of spliceosomal protein sequences across the major eukaryotic groups (Collins & Penny, 2005). Introns early and introns late supporters now agree that both the spliceosome and spliceosomal introns originated long before the most recent common ancestor of living eukaryotes, and that little of the primigenial exon/intron boundary distribution is left due to rapid intron turnover (Wolf, Kondrashov, & Koonin, 2001). In addition, different lineages reveal disparate intron loss and gain patterns; intron sliding is a rare event and the majority disclose simply one base relocation (Rogozin, Lyons-Weiler, & Koonin, 2000; Sakharkar, Tan, & de Souza, 2001).

Nevertheless, questions remain concerning the origin of introns. Whether spliceosomal introns were present in the LUCA, or they were risen from bacterial group II introns after these self-splicing introns invaded into the nucleus, are still the riddles and suggest the need for further studies. A new theory has recently emerged, the "introns first" theory, which proposes that introns and the spliceosome are remnants from the RNA world (Jeffares, Poole, & Penny, 1998). This hypothesis is based on the observation that putatively ancient, small nucleolar RNA (snoRNA) genes are often encoded by introns. RNAs were the only catalysts for the assembly of an all-RNA ribosome before the emergence of proteins, and snoRNAs must have been used for the assembly of the primitive ribosome as it evolved towards full protein-producing capacity (Poole, Jeffares, & Penny, 1999). Hence, the introns that contain snoRNAs must predate the protein-coding exons that surround them. To date, proponents of all three hypotheses about the origin of introns lack sufficient evidence to refute the other theories, and the controversy continues.

Intron Functions

The presence of introns in eukaryotes has several disadvantages, including

waste of time and energy during gene expression on polymerizing extra-long intronic segments of pre-mRNA molecules; and

potential errors in normal splicing, as long introns contain numerous false splicing sites, called pseudo exons.

Some benefits must be associated with introns to compensate for these disadvantages. Several important functions of introns have been uncovered that counter the concept of introns as selfish, non-functional genomic elements.

Alternative splicing of pre-mRNA, due to the existence of introns in a gene, is a prominent mechanism for generating protein diversity. The exons of pre-mRNA are reconnected in multiple ways during splicing of introns, resulting in different mRNAs that can be translated into different protein isoforms. This allows a single gene to code for multiple proteins. Five basic modes of alternative splicing are generally recognized: alternative 5'-splice sites, alternative 3'-splice sites, exon skipping, mutually exclusive exons, and retained introns (Black, 2003).

Introns contain several types of non-coding, but functional, RNA sequences. snoRNAs are located inside introns, and are produced during post-splicing processing of intronic RNA (Hüttenhofer, Brosius, & Bachellerie, 2002). snoRNAs guide the process of pseudouridylation and methylation in pre-rRNA by complementary pairing of their guide sequences with rRNAs (Maden & Hughes, 1997). Another type of non-coding RNA, microRNA (miRNA), is also frequently found inside introns (Bartel, 2004). miRNAs are short nucleotide RNA sequences that bind to complementary sequences in the 3'UTR of multiple target mRNAs, usually resulting in their silencing. In vertebrates, 40-70% of miRNAs appear to be expressed from introns of protein- and non-coding transcripts. Intronic miRNAs are less common in worms and flies, 15% and 39%, respectively, in protein coding genes (Griffiths-Jones, Saini, van Dongen, & Enright, 2008).

Intronic sequences were found to possess numerous elements that regulate gene expression. For example, the second intron of the human nestin gene contains an evolutionarily conserved region directing gene expression to central nervous system (CNS) progenitor cells and to early neural crest cells (Lothian & Lendahl, 1997). Likewise, in order to express the human apolipoprotein B gene in liver, the second intron of this gene is essential (Brooks et al., 1994).

Intron Gain and Loss as Phylogenetic Tools

The phylogenetic tools that are typically used for systematic work in all species groups currently consist of

chloroplast DNA (cpDNA) sequences, which are conserved in all plants and rarely recombine (Kim & Jansen, 1995); and

the internal transcribed spacer (ITS), which is the DNA sequence between ribosomal RNA genes (Baldwin, 1992).

Intron sequences and positions provide a record of the evolutionary history of a species or group of species and, therefore, may also contain valuable phylogenetic information. Non-coding DNA segments, such as introns, lack functional constraints. Therefore, the patterns of insertion and deletion of introns should reflect evolutionary patterns within and among species. Moreover, obtaining intron sequence data sets from map-poor organisms is easy and fast with PCR-based methods, and intron positions can be easily pinpointed using various sequence alignment programs/tools. Thus, introns may provide a useful complement to cpDNA- and ITS-based phylogenies for poorly studied groups.

To date, there are two phylogenetic strategies utilizing introns as phylogenetic markers at quite distinct evolutionary aspects. Intron sequences, which evolve more rapidly than do coding sequences, have been used to resolve relationships between closely related species (Slade, Moritz, Heideman, & Hale, 1993). Protein sequences may experience too little change to yield sufficient signal, and intronic sequences provide a relatively simple alternative approach to recent evolutionary changes. On the other hand, intron loss and gain have proved to be very slowly evolving characters in most lineages studied to date. Like other chromosomal rearrangements, such as inversions and translocations, intron gains and losses are rare events. Therefore, intron positions retain a large amount of information about genome structure and deep evolutionary history, and provide a useful phylogenetic tool for evaluating distant evolutionary relationships. For example, Venkatesh, Ning, and Brenner (1999) used a few intron loss and gain patterns to group species into clades.

Study System

Universal Markers

To characterize intron changes within the family, orthologous genes must be compared. Previous studies demonstrated the usefulness of orthologous gene comparisons for PCR-based identification of conserved syntenies in birds, mammals, and insects (Lyons et al., 1997; Smith et al., 2000; Chambers et al., 2003). Whole genome duplications, as well as extensive local duplications and rearrangements, are hallmarks of plant evolutionary histories (Adams & Wendel, 2005). To avoid the complications of comparing paralogous copies, the proposed study will be confined to the conserved orthologous set (COS). These are unique and single copy genes, distributed evenly over genome regardless of segmental duplication regions and ploidy, that are conserved across evolutionarily divergent species (Fulton, Van der Hoeven, Eannetta, & Tanksley, 2002). The study will use "universal" PCR primers designed for conserved regions within exons that flank one or several introns (Chapman, Chang, Weisman, Kesseli, & Burke, 2007). Three underlying features of intron evolution make this a feasible strategy for identifying orthologous targets. First, exons are relatively conserved among related species (Paterson et al., 2009). Second, intron positions do not readily change and are thought to be highly conserved across species, even over long evolutionary time. Thus, the approximate position of an intron between two consecutive exons can be predicted from related species with detailed genome maps. Preliminary data showed that the presence of introns in Arabidopsis thaliana is a surprisingly good predictor of introns in distantly related Asteraceae (Roy, Fedorov, & Gilbert, 2003). Third, introns evolve faster than exons in the same a gene, and are more diverse and polymorphic than the exons (Guo, Wang, Keightley, & Fan, 2007).

Generation of primers

Lettuce and sunflowers belong to the two most distant branches in the phylogenetic tree of Asteraceae. If primers can be found that work for lettuce and for sunflower, they will likely work for most other species between them phylogenetically. The selected species of Asteraceae will be compared to Arabidopsis, which was selected as the comparative model organism for two reasons. First, its genome is sequenced and available from The Arabidopsis Information Resource (TAIR; Second, both Arabidopsis and species in the Asteraceae are dicotyledons. Over 1300 lettuce/sunflower/Arabidopsis COS alignment trios were obtained by screening the expressed sequence tag (EST) data for lettuce and sunflower from the Compositae Genome Project Database (CGPDB; against the Arabidopsis genome. To identify potential primer regions, introns in composites were assumed to be located in similar regions to those in Arabidopsis. An intron-annotation tool was developed using Python, a program that took the Arabidopsis sequence from the 1,343 putative COS triplet alignments, BLASTed it against an Arabidopsis genomic database that contained introns, and regenerated the COS groups with Arabidopsis introns annotated (Figure 2). The intron-annotated COS groupings were then subjected to the primer design program, PriFi (Fredslund, Schauser, Madsen, Sandal, & Stougaard, 2005), which generated 232 sets of primers located within exons and amplified across at least one intervening intron. The COS loci provide PCR-format gene markers that can be used to construct gene maps and identify species-specific genes. The present study will use them to investigate intron evolution within the Asteraceae family (Figure 3).

Figure 1. A flowchart of the universal primer design. Arabidopsis sequences from the triplet alignments were BLASTed with the Arabidopsis genomic DNA sequence in the TAIR, using the intron-annotation tool. A marker was inserted at the site of the predicted intron. Intron-annotated triplet sequences were subsequently subjected to PriFi, which designed putative universal primer pairs.

Figure 2. The number of successful primer pairs per species (from Chapman, Chang et al., 2007). A phylogenetic tree representing the relationships between the eight taxa is shown below the graph. The results reveal phylogenetic relationships among the species and demonstrates the feasibility and sensitivity of the universal primers.

Experimental Species

The proposed research focuses on the Asteraceae, the composites. Asteraceae is the largest family of flowering plants; nearly one in ten plant species in the world are members. Many composites have been domesticated, including over forty agriculturally and economically important species, such as lettuce and sunflowers (Kesseli & Michelmore, 1996). Eight of the twenty most invasive plants in the U.S. are composites (e.g., thistles, knapweeds, and dandelions), and control of and damage by weeds incur an estimated annual cost of more than $120 billion (Pimentel, Zuniga, & Morrison, 2005). Although the Asteraceae is a relative young family, originating in the mid Eocene, it has undergone remarkable diversification during the last 40 million years, with respect to the 250 million year history of flowering plants (Bowers, Chapman, Rong, & Paterson, 2003). The family now has 20,000 - 30,000 species and has successfully adapted to nearly every type of terrestrial habitat (Funk et al., 2005). Despite this diversity, the genetics of individual members of the Asteraceae have not been extensively studied.

The classification of the Asteraceae has been recently re-evaluated (Panero & Funk, 2002), and now includes 11 subfamilies and 35 tribes (Figure 1). The target species selected for the proposed study are lettuce (Lactuca sativa), sunflower (Helianthus annuus), trevo (Dasyphyllum diacanthoides), gerbera daisy (Gerbera jamesonii), safflower (Carthamus tinctorius), spotted knapweed (Centaurea maculosa), yellow starthistle (Centaurea solstitialis), chicory (Cichorium intybus), dandelion (Taraxacum officinale), curry plant (Helichrysum italicum), eastern silver aster (Symphyotrichum concolor), candle plant (Senecio articulatus), sunchoke (Helianthus tuberosus), and prairie blazing star (Liatris pycnostachya). These 14 taxa reside in five different subfamilies: Asteroideae, Cichorioideae, Carduoideae, Mutisioideae, which together account for 99% of the specific diversity of the family, and Barnadesioideae, which includes a single tribe, the Barnadesieae. Based on molecular evidence, the subfamily, Barnadesioideae, is considered to be basal to the Asteraceae (Urtubey & Stuessy, 2001). Therefore, species in this subfamily may display a pattern of intron evolution that is distinct from patterns in other clades.

Figure 3. Phylogenetic tree of Asteraceae showing the relationships among major clades of the family (adapted from Funk et al., 2005, p.355). The fourteen DNA samples selected for the proposed study (indicated by red arrows) include members of five subfamilies.

Proposed Research

A: Investigations of intron size, genome size, and ploidy level in the Asteraceae


Genomes exhibit a remarkable range of sizes in both plants and animals (Bennett, & Leitch, 2005; Gregory et al., 2007). Across broad phylogenies, genome size may be correlated with intron size, due to the fact that unconstrained regions, such as introns, have evolved at higher rates than coding regions (McLysaght, Enright, Skrabanek, & Wolfe, 2000). Analysis of 199 introns in 22 orthologous genes, for example, showed that intron size in pufferfish (Fugu) was eight times smaller on average than in humans, which is consistent with the smaller total genome in pufferfish (McLysaght et al., 2000). Intron size is also correlated with genome size in more recent divergent lines, such as Drosophila (Moriyama, Petrov, & Hartl, 1998). However, some species, such as cotton (Gossypium), show that intron sizes in plants may remain remarkably static, despite mechanisms that greatly expand or shrink other genomic components (Wendel, Cronn, Alvarez, Liu, Small, & Senchina, 2002).

Polyploidy in plants also has some unexpected consequences with respect to genomic characteristics. Polyploids might be expected to have larger C-values than diploids. The C-value is the amount of DNA contained within a haploid nucleus, and should increase in direct proportion to ploidy level. This expectation holds true in synthetic polyploids and newly formed polyploids (Pires et al., 2004). However, extensive genomic rearrangements, including gene loss, often accompany the onset of polyploidization (Levy & Feldman, 2003). Studies of maize showed that about half of all duplicated genes have been lost in the 11 million or so years since the polyploidy event that gave rise to the progenitor of maize (Lai et al., 2004). Studies of Arabidopsis showed that polyploidy is followed by a genome-wide removal of some redundant genomic material (Ku, Vision, Liu, & Tanksley, 2000; Ziolkowski, Blanc, & Sadowski, 2003). Differential gene loss, i.e., loss of some duplicates but not others, following polyploidy is responsible for much of the diversity in genome size among closely related plants (Paterson, Bowers, Peterson, Estill, & Chapman, 2003). The proposed research will examine the interrelationships among intron size, genome size, and ploidy within the composite family.

Question 1: Why don't genome sizes correspond to ploidy changes within the Asteraceae?

The influence of genome size on intron size may be confounded by numerous other unstudied covariables if the divergence time between investigates species is too large. It appears that more valuable information will be gained using closely related taxa that vary in genome size but share recent evolutionary history and a broad suite of life-history features, like plants in the Asteraceae family. Lettuce (2n = 18) is believed to be diploid whereas sunflower (2n = 34) is thought to be an ancient tetraploid (Solbrig, 1977), yet the genome size is similar (1C = 2.65 pg and 3.65 pg, respectively). Why is that? Through a literature review, three hypotheses were proposed to explain the paradox.

Hypothesis 1a: The lettuce genome has expanded via "junk" DNA. The inter- and intra-genic sequences (introns) in lettuce might be bigger than those in sunflower, which leads to "genomic upsizing" in lettuce.

Hypothesis 1b: Sunflower has lost most of its additional DNA following polyploidy formation, whereas the genome size for lettuce remains constant. Introns in sunflower would have decreased in size.

Hypothesis 1c: Lettuce and sunflower have the same ploidy and chromosome number differences are indicative of chromosome breaks, fusions and rearrangements, or transposable elements invasion.

Junk DNA has an equal chance of being added to the introns or to spaces between the genes. If there are more junk DNA in lettuce than in sunflower, they will be in the introns as much as in the spaces, and thus the tendency can be expressed solely by intron length. Sunchoke (2n = 102; 1C = 12.55 pg), closely related to regular sunflower, is hexaploid and had undergone a recent polyploidy event. If Hypothesis 1b holds, more rapid loss of sequences in sunflower than in sunchoke is expected to occur. This could be due to the process known as diploidization, in which old polyploids tend to be more diploid-like than newly formed polyploids (Soltis, Soltis, & Tate, 2003). Additionally, Leitch and Bennett (2004) found that genome size tended to decrease with increasing ploidal level using the dataset of C-values in the Angiosperm DNA C-values database ( to make comparisons of diploids and polyploids. Otherwise, lettuce might have gained more non-genic DNA sequences and thus results in the comparable C-values in the diploid and tetraploid levels when Hypothesis 1a is true.

Question 2: Do rare species have an accumulation of "junk" compared to invasive species?

There have been other previous studies noting that rare and endangered species, which usually have reduced population sizes, have larger genomes than more common, invasive species (Vinogradov, 2003), possibly due to that increases in genome size are the result of deleterious mutations which fix via drift in small populations initiated by non-adaptive processes (Lynch & Conery, 2003).

Hypothesis: Widespread and invasive species have shorter introns compared with rare, endangered species.

Question 3: Do rapid life cycle annuals have smaller or fewer introns than long lived perennials?

The durations of mitosis and of meiosis are both positively correlated with genome size (Van't Hof & Sparrow, 1963; Bennett, 1971). Accordingly, it is generally assumed that species with a short minimum generation time have a shorter mean cell cycle time and mean meiotic duration, and a lower mean genome size as well as shorter introns, than species with a long mean minimum generation time (Bennett, 1972).

Hypothesis: Because life cycle of annual species are shorter than for perennial species, annuals are selected to have smaller introns than perennials.


The lengths of 144 introns collected from 13 different species across the entire Asteraceae family were compared. Figure 4 shows the comparisons of average intron lengths among these species. Species within a subfamily are apt to possess significantly similar mean intron sizes. Species in the subfamilies Mutisioideae and Carduoideae tend to have much longer introns than do those in Cichorioideae, while the intron lengths of species in Asteroideae are in-between.















n =














Figure 4. Histogram of the average amplified intron size of each species. Members of the same subfamily share the same color (gray: Mutisioideae; blue: Carduoideae; green: Cichorioideae; yellow: Asteroideae). Phylogenetic relation is clearly shown in the average intron size across tested subfamilies.

Pair comparison was used to detect intron size variation of each universal marker from different species, as shown in figure 5. There is a clear biased intron size reduction across the genome in paired comparisons of species in different subfamilies (figure 5a). In contrast, ploidy, genome size and other features within sub-families do not create biased patterns of intron change (figure 5b & c). Intron sizes of species within a subfamily but in different tribes show unbiased but slightly dispersed pattern (figure 5b), whereas those in the same tribe have nearly identical intron lengths (figure 5c).




Figure 5. Scatter plots examples of different pairwise species comparisons of intron size.

(a) Inter-subfamily comparisons. (b) Intra-subfamily but distinct tribe comparisons. (c) Intra-tribe comparisons. The red line is a slope equal to 1 indicating no change in intron size; black line is the slope that best fits the data.

With respect to the primary issue of whether genome and intron sizes and ploidy level are correlated within the Asteraceae, the data show unequivocally that these genomic features are uncoupled. Intron size variations among subject species are independent to their genome size variations and ploidy changes. Genome size and ploidy level vary greatly within subfamilies but intron size does not. Phylogenetic signal (species relatedness) seems to be the best predictor of intron size changes among species. Intron size similarity is highest in the intra-subfamily scatter plots. Additionally, it is possible that sunflower is not a tetraploid as believed and its chromosome number increased by chromosome breakage instead of genome duplication indicating that ploidy levels may not be clearly evident based on chromosome numbers.

B: Patterns of intron loss and gain in the Asteraceae


Introns are under less selection pressure than exons, so intronic sequences have a higher rate of loss and gain than exons. Recent studies concluded that differences in intron densities among families are due to different histories of intron dynamics; that is, some groups of organisms have gained many introns, while others have lost many introns (Rogozin, Wolf, Sorokin, Mirkin, & Koonin, 2003). Two main models for the loss of introns have been proposed (Roy & Gilbert, 2006). The classical model is recombination of a genomic sequence with a reverse-transcribed copy of mRNA. The genomic deletion model involves deletion of an intronic sequence from the genomic DNA. In the case of intron gain, five major mechanisms have been proposed (ibid.). They include 1) insertion of a reverse-transcribed intron into a new position; 2) insertion of a transposon; 3) tandem duplication of an exon; 4) intron transfer between paralogs, through recombination; and 5) insertion of a self-splicing type II intron via reverse splicing.

Genome-wide comparisons of closely related species of eukaryotes indicated that intron losses have prevailed over gains during recent evolution (Roy et al., 2003). Studies of mammals, fungi, and parasitic protists found, in each case, less than a dozen total gains among thousands of genes over tens of millions of years (Roy et al., 2003). The majority of results from studies on a wide variety of plants, from enslaved algae to vascular plants, found an excess of intron loss over intron gain (Roy & Penny, 2007). For example, much more loss than gain was found in rice (Lin, Zhu, Silva, Gu, & Buell, 2006). The single exception to date is the long lineage leading from the plant-animal ancestor to Arabidopsis, which showed more intron gain than loss (Rogozin et al., 2003). Another study also reported more intron gains than losses in recent, segmentally duplicated pairs of Arabidopsis genes (Knowles & McLysaght, 2006). However, Gilson et al. (2006) found large-scale intron conservation over long evolutionary distances within green algae, and between green algae and Arabidopsis.

No previous comparative studies examined patterns of intron loss and gain within the Asteraceae family. Different patterns in distinct clades may provide valuable information about phylogenetic branching. Therefore, the proposed research will address three questions concerning the dynamics of intron evolution in the Asteraceae.

Question 1: How prevalent are intron loss and gain, and how does intron size change across the composite family?

Hypothesis 1a: Intron loss has occurred more frequently than intron gain, as has been found in most other plant families.

Hypothesis 1b: Low rates of intron loss and expansion of intron number are characteristic of the composite family.

Hypothesis 1c: The very diverse and recently evolved Asteraceae have a different pattern of intron loss and gain than most other plant families, with more frequent changes in intron number and size.

Hypothesis 1a is not only based on results of previous plant studies, but on the fact that intron removal reduces processing times for mRNAs, including the transcription and splicing times for introns. As an illustration, rapidly reproduced organisms, such as Asteraceae, tend to have fewer or shorter introns than long life cycle ones, due to selection for genes that can produce proteins quickly in response to external stimuli (Jeffares, Mourier, & Penny, 2006).

Hypothesis 1b speculates Asteraceae experience relatively high rates of intron gain for several possible reasons.

Many land plants have very high numbers of mobile elements, which could be advantageous for intron gain (Roy, 2004).

The only documented cases of new intron origin are found in land plants (Iwamoto, Maekawwa, Saito, Higo, & Higo, 1998), suggesting a high potential for intron gain.

Many land plants have long generation times (Jeffares et al., 2006), and therefore, might experience weaker selection against the inefficiencies associated with introns.

Effective population sizes of some land plants are small relative to most other lineages (Lynch & Conery, 2003); if new inserted introns are slightly deleterious, they might be expected to accumulate more rapidly in these plants.

Question 2: Do species-rich taxa (tribes and sub-families) have more intron changes than species-poor taxa?

The subfamily Asteroideae contains 20 tribes and about 65% of the species in the Asteraceae; other subfamilies like Mutisioideae has only one tribe and involves ca. 3% of the species in the family, and that may show differences.

Hypothesis: More diversity of intron size in species-rich groups than species-poor groups.

Question 3: Are there biased patterns of intron change among the different species and groups?

Continuing from the previous question, the family Asteraceae is the most diverse of all plant families and have evolved rapidly (Funk et al., 2005), which may cause biased patterns of intron variation among different groups and/or species within the family.

Hypothesis: There are biased patterns of intron size change across the genome in some taxa.


To investigate the roles of both gain and loss in intron evolution, 144 universal markers flanking one or more potential introns were screened against 13 composite species. Under the results, 12 intron loss and gain events were found, 1 gain and 11 losses (figure 6). Gains and losses are rare (12/144), but clearly evident. Two intron absences are across all composite species, one is raw gain in Arabidopsis as the intronic sequence is absent in the outgroup species, V. vinifera, while the other is inferred to loss with the intron position conserved in V. vinifera. Among the rest 10 intron loss events within the Asteraceae, five of them are occurred in the subfamily Cichorioideae, three in Carduoideae and two in Asteroideae. The data show that intron loss has occurred more frequently than intron gain across the composite family (11:0, respectively), but reveal the irrelevance of species richness of taxa and intron changes in the family. Asteroideae, which contains 20 tribes and about 70% of the species in Asteraceae, has only two intron loss evens, while Cichorioideae, comprising 14% of the diversity of the whole family, has the highest five intron losses. Intriguingly, this biased pattern of intron size change may interpret the shortest mean intron lengths in Cichorioideae that genome streamlining or other selection pressures acting on this subfamily. All of these markers will provide valuable benchmarks for constructing phylogenetic relationships within the family.

Figure 6. Phylogenetic tree of the Asteraceae and intron gain and loss patterns (adapted from Funk et al., 2005, p.355). A green arrow indicates a raw intron gain in the corresponding species; a red arrow indicates a raw intron loss in the corresponding branch. The number prior to an arrow and after "x" indicates the number of intron gain/loss events occurred in the specified tribe, if applied.

Materials and Methods

Molecular Techniques

Genomic DNA samples were isolated from leaves of all target species of the Asteraceae using the FastDNA® Kit following the manufacturer's protocol. 144 out of 232 universal primer pairs previously generated in our project that have a higher successful amplification rate (> 50%, Chapman et al., 2007), along with additional 48 primer sets designed by Primer3 ( from the remaining 1,111 COS triplet alignments, are tested for amplification of the composite species. The PCR was conducted in a total reaction volume of 15 μl containing 150 ng of genomic DNA, 1X Green GoTaq® Reaction Buffer, 0.6 μM of each primer, 1.5 mM MgCl2, 0.2 mM dNTP, 1X CES (Ralser et al., 2006), and 0.75 units of GoTaq® Flexi DNA Polymerase. Touchdown PCR was carried out as follows: an initial step at 95℃ for 4 min, an additional step where the temperature is spiked to 97℃ for 1 min, 15 cycles of denaturation at 95℃ for 30 s, annealing temperature at 65℃ for 45 s, which is decreased by 1℃ per cycle, and extension at 72℃ for 1 min, followed by 25 cycles each of 95℃ for 30 s, 50℃ for 45 s, 72℃ for 1 min, and a final extension step at 72℃ for 10 min. PCR products were separated on a 2% agarose gel in 1X TBE buffer. The gel was photographed and PCR band size was measured using the Molecular Imager® Gel Doc™ XR System and the software package Quantity One, respectively.

Statistical Analysis

The amplicons are quantified by comparing their sizes. The intron lengths of species of interest will be analyzed by paired-sample t tests at significance level α = .05. If significant difference in intron size is detected between two paired species, phylogenetic comparisons within the family and/or between composites and related families (e.g. Arabidopsis) will be needed to determine if introns in a species have increased or decreased in size. Consequently, regression analysis is performed, and the Pearson's r correlation coefficient between two species is used in the validity analysis. Scatter plots and regression lines are used for describing the data in order to compute the trend of paired intron sizes (Figure 7).

Figure 7. Flowchart of procedure to model the relationship between intron lengths of two different composite species.

Phylogeny Analysis

Using the Arabidopsis genes as the outgroup, an intron loss can be defined if the intron is present in the putative Arabidopsis ortholog but not in the interested composite species, where the PCR product size of the composite sample, determined by Quantity One, is equal or close to the intronless Arabidopsis cDNA size. Occasional composite species with much larger intron than that of the compared putative Arabidopsis ortholog could indicate presence/gain of second intron in the species. If an intron is absent across the entire tested composite species but present in Arabidopsis, the intron position is inferred as a loss or gain based on consistency to another outgroup sequence, grape (Vitis vinifera) (Figure 8), with full genome sequence availability ( In both cases, further downstream sequencing of the PCR products of the suspected gene loci using ExoI/SAP purification protocol or MinElute gel extraction kits (Qiagen) and ABI Prism® 310 Genetic Analyzer may be required.

Figure 8. Phylogenetic relationships between the Asteraceae and the other outgroups, Arabidopsis and V. vinifera.