This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Phylogenetic analysis of Physaraceae myxomycetes using SSU rDNA variable regions (V1-V10)
Myxomycetes with more than 850 species are classified into five taxonomic orders i.e.,
Echinosteliales, Liceales, Physarales, Stemonitales and Trichiales (Everhart and Keller 2008). Physaraceae being one of the family in Physarales order comprises the genera Badhamia, Fuligo, Physarella, Physarum, Physarida (unclassified) and Protophysarum. Morphological characters such as structure of fruiting bodies, lime deposition in the capillitium and sporocarp colors are used to identify this species (Stephenson and Stempen 1994; Fiore-Donno et al. 2005; 2008; Takahashi and Hada 2010). However, species relationships and life history are not reflected based on the morphological concept because various species exist with similar morphological features (Clark 2000) this made to use molecular characters. Recently SSU rRNA and elongation factor- 1alpha (EF-1∝) gene sequences are used to resolve the phylogenetic relationships of the myxomycetes (Fiore-Donno et al. 2005). The resolution of Physaraceae at the level of genus and species is carried out with the application of small subunit (SSU) and large subunit (LSU) rRNA gene sequences. Although this analyses were based on conserved regions of rRNA genes and/or excluded highly variable sequences within the variable regions (Fiore-Donno et al. 2008; Fiore-Donno et al. 2010Fiore-Donno et al. 2012, Nandipati et al. 2012), indicates little attention was given to variable regions due to the lack of secondary structure information.
In this study we have predicted secondary structures of ten variable regions of SSU rRNA gene to elucidate the phylogenetic relationships of Physaraceae. The low substitution rates of highly conserved regions of rRNAs make them well suited for establishing distant phylogenetic relationships (Hillis and Dixon 1991; Van de Peer and De Wachter 1997). In contrast, variable rRNA regions show higher substitutions rates, and also contain INDELS (i.e., insertions and deletions). Thus, in order to optimize sequence alignments the use of secondary RNA structure information has be taken into account, this features make them as powerful markers for deducing close relationships i.e., at the genus or species level (Mai and Coleman 1997; Hwang et al. 2000; Wikmark 2007a; Wiemers et al. 2009). Our study focused on ten variable and conserved regions of SSU rRNA gene independently. Initially, we predicted the consensus RNA secondary structures of the corresponding regions from 45 Physaraceae isolates by using Mfold and refined by subsequent manual adjustments. The predicted secondary structures information of variable and conserved regions were used to construct a robust Physaraceae phylogeny.
Materials and Methods
Data collection, Secondary structure prediction and sequence alignment
A total of 45 myxomycetes isolates belonging to genera Badhamia, Fuligo, Physarella, Physarum, Physarida (unclassified) and Protophysarum small subunit ribosomal RNA gene sequences (SSU) retrieved from Genbank April 2014 are shown in Table 1. The conserved (C1-C11) and variable regions (V1-V10, Figure 1) of SSU rRNA gene in Physarum polycephalum was inferred based on the predicted SSU rRNA structure of Saccharomyces cerevisiae (Gutell 1993, Cannone et al. 2002). To construct the secondary structure of ten variable regions, default conditions of the Mfold was used followed by manual refinements (Zuker 2003). All the sequences were unambiguously aligned based on the consensus secondary structure provided in this paper using BioEdit version 7.2.5 (Hall 1999).
A total of three datasets containing individual rDNA regions of SSU (I) V1-V9 (699 nucleotides, nt); (II) V1-V10 (804 nt); (III) C1-C10 (980 nt), and two concatenated datasets (IV) C1-C10/V1-V9 (1679 nt) and (V) C1-C11/V1-V10 (1789 nt) were used to address the phylogenetic relationships. The evolutionary models for each dataset was chosen based on Akaike's information criterion (AIC) using jmodel version 2.1.4 (Darrib et al. 2012). The chosen models were TIM2+I+G (V1-V9 and C1-C10), TVM+I+G (V1-V10), TrN+I+G (C1-C10/V1-V9) and TIM1+I+G (C1-C11/V1-V10). The above chosen models were used to construct Maximum Likelihood (ML) phylogenetic trees using PhyML 3.0 9 (Guindon and Gascuel 2010). The robustness of Neighbor Joining (NJ, Jukes-Cantor model, MEGA 6 software, Tamura et al. 2013) and ML (GTR model) tree topologies were tested using bootstrap analysis of 1,000 replicates. For all datasets GTR+I+G model was used to calculate the Bayesian posterior probabilities (BPP) using Mr. Bayes version 3.2 (Ronquist et al. 2012). A total of 1,000,000 generations with two independent runs with sampling every 100 generations were carried out. The posterior probabilities at internal nodes were generated from the remaining 15,000 trees that are obtained after the final discard of 25 % initial tress.
Secondary structure features of V1-V10
In this study, the nine variable regions (V1 and V3-V10) were identified based on the models from the European SSU RNA database (Van de Peer et al. 2000). Due to high G+C content (48–82%) the helix 8-1 (H 8-1) was assigned as variable region V2, this helix is reported in few eukaryotic taxa (Figure 2, Johansen et al. 1988, Neefs et al. 1993, Gillespie et al. 2005). The percentage of identity at each nucleotide position, range of length (nt) and % G+C content for each variable region in the consensus secondary structure are shown Figure 2. The genus Physarum shows the most length variation ï›V3(137–185 nt) ; V5 (274–445 nt)ï and highest %G+C content ï›V2 (48–82%)ï, whereas V7 (22–25nt; 27–39%) shows least length variation and %G+C content in all genera. Furthermore, the high variation in average pairwise distance (0.311– 0.466) is shown by V2,V3,V5,V8 and V10. Whereas V4, V6 and V9 shows moderate variation in average pairwise distance (0.087–0.178), and the rest of them i.e.,V1 and V7 shows least variation in average pairwise distance (0.047 and 0.010 ) (Table 2). The average pairwaise distance for conserved regions (C1-C10) is 0.034.
The presence of high variation in length which corresponds to high variability at 5' half and low variability at 3' end has made this region to be particularly challenge to predict. The proposed V5 secondary structure information of P. polycephalum (Wuyts et al. 2000) have been used to predict the Physaraceae V5 region. The predicted consensus secondary structure consist of helices E23-1, 4 and 8 as long range interactions, E23-2, 5, 7 and 12 as hairpin structures, E23-9 to 12, 13 and 14 as two pseudoknots. The sequence of helices E23-8 to 14 can be adequately align within the Physaraceae isolates.
Single regions (variable and conserved) analyses
Although the use of full length SSU rRNA gene sequences in phylogeny has been studied previously, none of these studies addressed independently with variable or conserved region sequences (Fiore-Donno et al. 2012, Nandipati et al. 2012). In order to elucidate the phylogenetic relationship of Physaraceae with variable regions which contain high rate of substitutions (Fiore-Donno et al. 2012), and to compare this with conserved regions. We have considered both regions independently (dataset I, II and III), and used consensus secondary structure information for phylogenetic analysis (Figure 2).
Among 45 isolates selected for phylogenetic analysis, either partial or complete sequences are not available for V10 region of 16 isolates. Thus for datasets (I, II and IV) a total of 45 isolates were selected, similarly a total of 29 isolates for datasets III and V, which includes the same taxa in the corresponding datasets (Table 1).
The dataset I (V1-V9) and dataset II (C1-C10) represents an alignment of 699 and 980 nt positions respectively, and includes 45 isolates which were clustered into 10 clades (i.e., Clade 1-10, Figure 3A and B). The ML trees based on datasets I and II show similar topologies with minor differences with respect to Physarum oblonga (represented by one taxa) as sister to clade 2 in dataset I and clade 8 in dataset II respectively. Similarly Protophysarum pholoiogenum as sister to clades 7 and 8 in dataset I and clade 10 in dataset II. Whereas, Physaridae W2i as sister to clade 4 in dataset I and clade 9 in dataset II with low support value (-/59 ML, ï‚³ 0.53 BPP). In both datasets I and II, clades 1, 4 and 5 consists intermingled species of Badhamia and Physarum, similarly clade 9 consist intermingled species of Physarum and Physaridae Sp. G1a. Whereas, clades 2, 3, 6, 7, 10 consist species of Physarum and clade 8 consist species of Fuligo (Figure 3A and B). Clades 1, 2, 4, 5, 6, 7, 9 and 10 in dataset I, clades 2, 6, 7 and 9 in dataset II were well separated with high support (ï‚³ 95% NJ, ML and ï‚³0.99 BPP). Clade 3 with moderate support in dataset I ( ï‚³78% ML, 0.99 BPP), whereas it is low supported in dataset II (0.62 BPP). Clade 8 shows low support (ï‚³ 0.75 BPP) in both datasets ( Figure 3A and B). In dataset II, clade 4 with moderate support (ï‚³ 78% NJ, ML and 0.93 BPP) whereas, clades 5 with moderate support (60% NJ, ï‚³ 0.99 BPP), clade 10 with low support (52% NJ, 0.61 BPP) and clade 1 without any support (Figure 3B).
In order to confirm whether addition of few nucleotides could have the chance for better resolution of Physaraceae phylogeny, we have included V10 region sequence to dataset I to a total alignment of 804 nt (dataset III). Figure 3C shows ML tree based on dataset III which includes 29 isolates clustered into 9 clades ( i.e., Clade 1-9), and shows similar topology with minor differences to that of dataset I (Figure1A). Phy. oblonga and Pro. pholoiogenum as sister to clade 1 without any support, and Physaridae W2i as sister to clades 3 and 5 with moderate support (61% NJ and 0.74 BPP). Clades 1, 2, 5-7 and 9 were well separated with high support (ï‚³ 85% NJ, ML and ï‚³ 0.81 BPP), clade 3 with moderate support (ï‚³ 61% ML and 1.0 BPP) whereas, clade 4 with moderate support (93% NJ and 1.0 BPP) and clade 8 with low support (0.88 BPP).
In summary, the ML phylogenies based on datasets I and III (Figure 3A and C) showed the well supported clades to that of dataset II (Figure 3B). Clade 3, 5 and 6 contain some topologies differences and is moderately supported with ML in datasets I and II (Figure 3A and B,) and BPP in dataset III (Figure 3C). Whereas, clade 8 with low support in all three datasets, and the topology differences with respect to Phy. oblonga, Pro. pholoiogenum and Physaridae W2i as sister have been noticed in three datasets (Figure 3A, B and C).
Combined regions (variable + conserved) analyses
In order to improve better resolution of Physaraceae phylogeny, we concatenated individual datasets to a final datasets with a total alignment of 1677 nt (dataset IV; V1-V9/C1-C10; 45 isolates) and 1789 nt (dataset V; C1-C11/V1-V10; 29 isolates). Figure 4A show ML tree based on dataset IV is in overall agreement with those of datasets I and II (Figure 3A and B), except with minor topologies differences within clade 3 in dataset I, clade 5, 6 and 7 in dataset II. The topology differences with respect to Phy. oblonga as sister clade 8 in dataset II and clade 2 in dataset IV without any support (Figure 3B and 4A). Similary, Pro. pholoiogenum as sister to clade 7 and 8 in dataset I, clade 10 in dataset II and clade 8 in dataset IV (Figure 3A, 3B and 4A) Physaridae W2i as sister to clade 3 and 4 in datasets I and IV with moderate support (69/- NJ, 86/59 ML, 0.69/0.53 BPP) have been noticed (Figure 3A and 4A). Clades 1, 2, 4, 5, 6, 9 and 10 are well supported with high support (ï‚³ 93% NJ, ML and 0.99 BPP), and clade 3 with moderate support (ï‚³ 67 NJ, ML and 0.66 BPP) in dataset IV (Figure 4A) is in overall agreement with dataset I except clade 7 (100% NJ, 99% ML, 0.66 BPP, Figure 3A).Whereas, clade 8 with low support (0.52 BPP) is in agreement with dataset I and II (0.75 and 0.95 BPP, Figure 3A and B) respectively.
Similarly Figure 4B show ML tree based on dataset V is in overall agreement with those of datasets II and III (Figure 3B and C), except minor topologies differences within clade 1, 3 and 5 in dataset II, and clade 7 in dataset III. Phy. oblonga as sister to clade 8 and Pro. pholoiogenum as sister to clade 10 without any support, and Physaridae W2i as sister to clade 9 in dataset II with low support (0.82 BPP, Figure 3B). The clades 1, 2, 4, 5, 7 and 9 were well separated with high support (ï‚³ 77% NJ, ML and 1.0 BPP) and clade 3 with moderate support ( ï‚³ 67 NJ, ML and 1.0 BPP) in dataset V are in overall agreement with dataset III (Figure 3C), and clade 8 with low support (0.88 BPP) is in agreement with datasets II and III (0.95 and 0.88 BPP, Figure 3B and C) respectively.
In summary, the ML phylogenies based on datasets IV and V (Figure 4A and B) show overall agreement to that of former individual datasets I and III (Figure 3A and C), but with minor topologies differences have been noticed within clade 3 in datasets I and IV (Figure 3A and 4B), clade 5 in datasets II, IV and V (Figure 3B, 4A and 4B), clade 6 and 7 in datasets II and IV (Figure 3B and 4B). Whereas, clade 8 with low support in datasets IV and V as seen in former individual datasets I, II and III (Figure 3A-C). The topology differences with respect to Phy. oblonga, Pro. pholoiogenum and Physaridae W2i as sister in datasets IV and V have been noticed with respect to dataset II (Figure 3B). Whereas, Pro. pholoiogenum in datasets I and IV (Figure 3A and 4A), and Phy. oblonga in datasets III and V (Figure 3C and 4B).
Features of Variables regions
We found that some of the information could be lost during the alignment of characters unambiguously in particular species which shows high length variation in V3,V5 and V10 regions (43- 171 nt), and moderate length variation in (17 -19) to V1,V2, V6 and V8 regions. Yet the variable regions provide sufficient informative characters and can be used as phylogenetic markers for Physaraceae, this is in agreement with the conclusions from the phylogenetic studies of various organisms (Hwang et al. 2000, Löhne et al. 2007, Dunthorn et al. 2012, Jang-Seu 2012). Apart, from this the highest length variation and % GC content of genus Physarum can be used as a distinguish feature. The conservation of helix 8-1 in particular organisms is proposed to be as independent origin (Neef et al. 1993, see Sims et al. 1999). We have noticed the E23-8 to 14 can be align easily based on the secondary structure, this is in contract that have been described for other eukaryotes (Wuyts et al. 2000)
Variable regions as molecular marker in Physaraceae phylogeny
Several gene sequences such as SSU rRNA and elongation factor 1 alpha (EF-1ï¡) have been applied to study the evolutionary relationships of the myxomycetes (Fiore-Donno et al. 2005; Fiore-Donno et al. 2010). In the recent study the Physaraceae phylogeny was resolved based on SSU and LSU rRNA genes. However, these studies excluded highly variable sequences within the variable regions (Fiore-Donno et al. 2012, Nandipati et al. 2012). In this present study the secondary structure information of variable regions has been taken into consideration to elucidate the Physaraceae phylogeny, and a concatenated dataset for better resolution of Physaraceae phylogeny. ML trees based on datasets (I and III, I and IV, III and V) showed well supported and similar topologies at the clades level with only minor differences. The trees based on five datasets consists of intermingle species of Physarum and Badhamia, and the trees topology based on datasets I and III (i.e., 699 and 809 nt) are in general agreement with the previous studies where full length of SSU rRNA (1566 nt) and SSU/LSU (2522 nt) phylogeny are used (Nandipati et al. 2012, Fiore-Donno et al. 2008, 2012). Our findings show the individual datasets based on V1-V9 and V1-V10 can be used as a good molecular marker when compared to the dataset based on conserved regions, and also signify that the predicted secondary structure of variable regions contain sufficient information to resolve a better Physaraceae phylogeny. On the other hand, the best support at the basal nodes can be obtained by the addition of conserved region sequences. This is the first study where independent datasets which consist of ten variable regions have been used to construct Physaraceae phylogeny.
From our analysis we conclude that the predicted secondary structure of SSU variable regions with an alignment of 699 or 809 nt contain sufficient phylogenetic information and better resolving power when compared with conserved regions of 980 nt.