The chloroplasts are essential organelles in plant cells and other eukaryotic organisms which perform photosynthesis. It is generally accepted that chloroplasts have originated as endosymbiotic cyanobacteria (Martin & Kowallik, 1999). Chloroplasts maintained an independent genome for encoding photosynthetic proteins and other housekeeping functions. Compared to free-living cyanobacteria, chloroplast genomes are considerably reduced, but the parts are still present similarities with the cyanobacterial genome. The chloroplast genome (cpDNA) is homogeneous circular double-stranded DNA molecules, and the size of it is around 110-200 kb, consists of 30 to 50 RNA genes and different number of protein coding genes, about 100 in terrestrial plants and green algae and 150 - 200 in nongreen algae (Sugiura, 1995). The nonrecombinant, uniparentally inherited characteristics make chloroplast genomes useful tools for evolutionary studies.
Although chloroplast genomes are potential for evolutionary studies, polymorphism detection at the population level is difficult because of the low level of substitutions. Thus, reconstructions of plastid evolution with traditional biomolecular methods have been proven particularly difficult in practice (Martin et al. 1998; Sugiura et al. 1998). The complete genomes made it possible to improve advanced methods for phylogenetic and evolutionary analyses. In the study of De Las Rivas et al (2002), they applied an approach which based on the quantitative analysis on COGs (clusters of orthologous proteins).
Get your grade
or your money back
using our Essay Writing Service!
The previous approaches are based on alignment of homologous sequences, and a lot of information (such as alignment gaps) in these data sets is lost. Qi et al (2004) developed a brand new analysis method of complete genome sequences which based on compositional vectors and do not require sequence alignment. This method was applied in the study of Chu et al (2004).
2. Brief introduction of methods
In the study of De Las Rivas et al (2002), A total of 17 fully sequenced cpDNA were downloaded from NCBI, including 8 land plants, 3 green algae, 1 Euglenophyta, 2 Rhodophyta, 1 Bacillariophyta, 1 Glaucocystophyceae and 1 Cryptophyta. Two nonphotosynthetic parasites complete proteomes were also included in the study by reason of functional and evolutionary similarities with cpDNA. The first step of analyse is functional annotation. These annotations of open reading frames (ORFs) were carried out with a computer program, which finds pairs or orthologs in 2 different genomes (Febrega et al. 2001). A binary matrix of orthologous chloroplast proteins was constructed based on the above results (1 for presence of a ortholog and 0 for absence). The 101- type matrix is provided to be the best. Factor analysis (FA) was applied for the comparative studies to find an underlying orthogonal factor model. After the optimal dimensionality found by FA, distances between each pair of genomes were computed. Phylogenetic trees were generated with a neighbor-joining method; estimations of confidences of each branch were obtained by jackknife bootstrap analysis using 1000 replicates. Frequency of branches in the original tree and distribution of trees were generated by the CONSENSE in PHYTLIP package. Functionally linked proteins tend to co-evolve by presenting patterns of correlation in accordance with their presence or absence in genomes. Therefore, this form of co-evolution can be detected by a similar way to the one which create the X-matrix (Pellegrini et al. 1999).
In the study of Chu et al (2004), complete sequences of 21 chloroplast genomes, 2 archaea genomes, 8 eubacteria and 3 eukaryotes genomes were retrieved from the NCBI database. Several steps involve in their study, First of all, it involves in the composition vectors and distance matrix. In the analysis, sequences of the 20 amino acids were regarded as symbolic sequences. And the conserved frequency p (Î±1 Î±2...Î±K) of the K-strings are calculated. Mutations make a random background at the molecular level, but the selections determine the evolutionary direction. To emphasize the selective diversification of sequence composition, the random background is subtracted from the results, this step is crucial before performing a cross-correlation analysis. The expected frequency q (Î±1Î±2 . . . Î±K) of appearance of strings can be predicted via a Markov model (Brendel et al. 1986). The shaping role of selective evolution (X) is then calculated. X (Î±1Î±2 . . . Î±K) are used as components to produce a composition vector for a genome. The composition vector X for genome X and Y for genome Y are constructed before the correlation C (X, Y) is calculated, the distance D (X, Y) between these genomes is then computed. A distance matrix for all genomes is then generated for constructing phylogenetic trees. Three different distance methods (Fitch-Margoliash, neighbor-joining and minimum evolution) were used to construct phylogenetic trees. The parameter K=4 and K=5 is applied to stable the topology of the trees. Bootstrapping is executed to give statistical support to the trees.
Always on Time
Marked to Standard
The topologies of the trees generated by 3 distance methods are similar in the study of Chu et al. (2004). In general, the chloroplasts are separated into 2 main clades, one of them corresponds to the green plants sensu lato (chlorophytes s.l.), consists of all taxa with chlorophyte chloroplasts, both primary and secondary endosymbioses originated. The other one includes the glaucophyte Cyanophora and taxa of rhodophytes s.l., correspond to rhodophytes (red algae) and their symbiotic derivatives. Compare with research results of De Las Rivas et al. (2002), some resemblances can be identified. They all showed a close relationship between Cyanophora and rhodophytes s.l. In the euglenophyte Euglena branches underlying to chlorophytes s.l., the tree topologies in both analyses are consistent with each other. The ME and NJ trees from the study of Chu et al (2004) and the study of De Las Rivas et al. (2002) consistently grouped Mesostigma with Nephroselmis as prasinophytes. Unexpectedly, an alternative topology of both NJ and FM trees of Chu et al. (2004) demonstrated that the angiosperms are more closely related to the Marchantia (liverwort) and the Psilotum (psilophyte) compared to the Pinus (coniferals), De Las Rivas et al. (2002) also indicated the same topology.
The tree generated by simple correlation analysis of complete chloroplast genomes is consistent with which generally accepted on the origin of chloroplasts and phylogenic relationships of different taxa of photosynthetic organisms as illustrated by traditional analyses, suggesting the reliable of the research results. In addition, because multiple sequence alignment (MSA) is not necessary, the substantial problem related to this complicated procedure can be avoided, suggesting that the simple correlation analysis of complete genomes is a new effective method for phylogenetic reconstruction and evolutionary analyses.