This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Procellariformes is the clade of birds which include albatrosses, petrels and their relatives Stearns and Hoekstra, 2005. Albatrosses belong to the family Diomedeidae, these birds are long-lived and are a charismatic component of the oceanic avifauna (Chambers et al., 2009). Albatrosses are large birds and therefore breed less frequently (Stearns and Hoekstra, 2005), their intrinsically slow rate of reproduction results in a monogamous pair raising only 1 chick every year or every 2 years (Chambers et al., 2009). Albatrosses are confronted by many threats such as pollution of the marine environment and the activities of international fishing fleets (Chambers et al., 2009). These problems are further compounded by the loss or disruption of breeding pairs (Chambers et al., 2009) and breeding sites (Stearns and Hoekstra, 2005).
Genomic research has performed a significant role in understanding questions such as adaptation, speciation and population genetics (Backström et al., 2008), as a result of this it is thought that a "stable evidence-based taxonomy is a critical requirement for the effective future conservation of the albatrosses" (Chambers et al., 2009). As stated by Zink and Barrow-Clough (2008), by "understanding the evolutionary history of populations there has been a dramatically enhanced by the acquisition of molecular data, revealing the distribution of genetic variation within and among populations".
This variation between populations is the result of speciation. Speciation may occur through four main evolutionary processes as stated by Mila et al. (2010), namely "micro-allopatric isolation of ancestral populations into at least three disjunct demographic units, which has resulted in divergence in neutral genomic markers; morphological differentiation along altitudinal gradients through local adaptation; the appearance of distinct morphs which now exist (or co-exist) in different distributions; and the maintenance of these patterns in place over time, through a combination of restricted dispersal and pre and/or post-zygotic mechanisms of reproductive isolation". It may even occur that all four evolutionary process occur. Once 2 taxa permanently cease to exchange genes (Chambers et al., 2009), genetic, morphological and behavioral divergence will occur and eventually resulting in reproductive isolation. From this one can observe the patterns of divergence and structures in populations (Mila et al., 2010)
Mitochondrial DNA cytochrome b (cyt b) is a gene target generally used for inter-specific levels of study, meaning comparison of species in the same genus or the same family (Castresana, 2002). It is furthermore among the most extensively sequenced genes to date across the vertebrates (Johns and Avise, 1998). Cytochrome b has a fast evolutionary rate of change (Nunn et al., 1996) and is a large protein-coding gene, however, with this in mind it has no indication of insertions or deletions (Nunn and Stanley, 1998) within the gene as this would result in loss of function of the gene.
It was previously thought that family Diomedeidae was composed of 14 taxa (Nunn et al. 1996), however, recent phylogenetic analysis has shown that Diomedeidae consists of 24 taxon (Chambers et al., 2009). Phylogenetic relationships can be found through various discrete methods, the most common being that of maximum parsimony (Nunn et al., 1996) (this method attempts to find a phylogenetic tree which requires the fewest changes, ignoring branch length information (Omland et al., 2008)), maximum likelihood methods with bootstrapping (Nunn et al., 1996) (branch lengths are used, thus a higher probability of changes in the branches represent longer spans of time (Omland et al., 2008)). And lastly, the Bayesian phylogenetic analysis (similarly to maximum likelihood, this model incorporate more components of the real uncertainty inherent in character mapping methods (Omland et al., 2008)).
Deriving molecular data, one can examine the global geographic range size (Webb and Gaston, 2000). It has been found that avian range sizes are not static but rather change over evolutionary time (Webb and Gaston, 2000). It has further been found that range-size transformations is not random in birds with the exception of taxa such as island endemics and some threatened species (Webb and Gaston, 2000)."In general, range sizes appear to expand relatively rapidly post speciation; subsequently, and perhaps more gradually, they then decline as species age" (Webb and Gaston, 2000).
The purpose of this study is to determine the correlation between the geographical distribution of the family Diomedeidae, its phylogenetic tree and the 'IUCN red list of threatened species' classification. Furthermore, using this correlation to aid in the conservation of the species by identifying which species is lower genetic diversity (Abbott and Double, 2003) due to a recent diversification.
Materials and method
Ingroup, outgroup and target gene selection
The programs used in this study was MEGA version 5 (Tamura et al. 2011) and Genbank (www.ncbi.nlm.nih.gov), a genetic database which searches for genetic sequences previously published.
The study species was the members of the Family Diomedeidae, this included sub-families Diomedea, Phoebastria, Phoebetria and Thalassarche (Table 1). The outgroup selected was Grus antigone. It was decided that the level of this study would be up to that of Family level, at this level of study the relationships between the various members of family Diomedeidae can easily be visualized. For this study, it was decided that the cytochrome b gene would be suitable for our investigation, this was due to it being a good mitochondrial DNA (mtDNA) target for inter-specific levels of study. Cytochrome b is a coding gene and is of a fixed length.
Constructing a data set
MEGA version 5 (Tamura et al., 2011) was open and a new alignment was created, Genbank database was then used for searching for data on the study species of interest and identifying the corresponding target gene cytochrome b, the selected entry ideally should be greater than 300 nucleotides in length. Once the reference sequence was found and inserted into MEGA 5 (Tamura et al., 2011), a nucleotide-nucleotide BLAST search was performed, with Blastn option selected as 'other' and not human genome sequences. Through this process, closely-related, homologous sequence Genbank entries was obtained and was used in compiling the ingroup dataset. Once all the ingroups were compiled and sequence alignment was performed and the sequence on the ends for which there was not homologous data was trimmed off.
Sequence statistics, model selection and phenetic analyses
Sequence statistic, model selection and phonetic analysis were performed next. The aligned sequence of the family Diomedeidae and outgroup and Grus antigone was analyzed and statistics such as variable sites (V), parsimony informative sites (Pi), nucleotide composition (i.e. average A,T,C and G values) and the R-statistic, which is the transition/transversion ratio was recorded. Next the overall mean distance (p-distance) was performed in order to determine the highest and lowest sequence divergence values between any two taxa, in other words, to determine the nucleotide evolution.
On the basis of the collected data, a sequence evolution model for the phonetic analysis was chosen, this was determined by performing a hierarchal likelihood model of selection test. The phylogeny of a 'neighbor joining' tree was constructed with analysis preference of pairwise deletion and bootstrapping between 1000 and 100 000. The phylogenetic construction was repeated for a 'Minimum evolution' tree with analysis preference of pairwise deletion and bootstrapping between 1000 and 100 000.
Parsimony analysis, likelihood analysis, rate of heterogeneity and molecular clock
Parsimony (Cladistic) Analysis, Likelihood analyses, testing for rate heterogeneity (Tajima's 3 taxon test and likelihood molecular clock test) and imposing a molecular clock was performed. The phylogeny of a 'Maximum parsimony' tree was constructed with analysis preference of pairwise deletion and bootstrapping between 100 and 1000. The consistency index (CI), retention index (RI) and rescaled consistency index (RCI) and 'tree length' values for the different homoplasy indexes were recorded. The phylogeny of a 'Maximum likelihood' tree was constructed with analysis preference of pairwise deletion and bootstrapping between 100 and 1000. The log likelihood (log L) scores and final number of trees obtained with this score were recorded. Inferring a 'neighbor joining' tree with analysis preferences of pairwise deletions and nucleotide model set to 'p-distance' a Tajima's relative rate test was preformed for imposing a molecular clock. The 'evolutionary rate' value was then set to 0.02 (2% sequence divergence per million years), this converted Tajima's relative rate test to a UPGMA (Unweighted-Pair Group Method with Arrhythmic means) tree.
The cytochrome b dataset was 1008 nucleotides in length and comprised of 25 taxa. Of these 1008 nucleotides, 279 sites were variable (V) and 195 sites were parsimony informative sites (Pi). Directly from the dataset, it was empirically determined that the average base frequencies were T=25.3, C=34.1, A=27.8, G=12.8 (Table 2). The R-statistic, which is the transition/transversion ratio was recorded to be 5.32 and the overall mean distance or 'd' statistic was 0.078 (7.8%).
The best fit model of sequence evolution selected under AIC (Akaike Information Criterion = 7468.845) for the data set at hand was the GTR (General Time Reversible) + G (Gama distribution models =0.21 and I=n/a) in MEGA 5 (Nei and Kumar, 2000; Tamura et al., 2011) (Table 4). Under the GTR+G model of sequence evolution, the average base frequencies were estimated to be T=25.3, C=34.1, A=27.8, G=12.8. The R-statistic, which is the transition/transversion ratio was recorded to be 9.38 and the overall mean distance or 'd' statistic was 0.086 (8.6%).
The Minimum Evolution tree (Figure 2) and the Neighbor-Joining tree (Figure 1(Appendix)) are almost indistinguishable, it was decided to include the Minimum Evolution tree in the result (Figure 2). The reason for this selection was due to the fact that in this method the sum of all the branch length estimates (S) is computed for all topologies and the topology with the smallest S value is selected as the best tree, it is considered optimal.
A bootstrap of 10 000 was performed in order to determine the confidence interval on the test of Minimum Evolution tree (Figure 2) of Diomedeidae (Felsenstein, 1985). It must be stated that 10 000 bootstrap replications, MEGA 5 (Tamura et al., 2011) automatically changed the bootstrap of 100 000 to a bootstrap of 10 000. A single best tree with an SBL (Sum of Branch Lengths) of 0.45873726 was constructed for the Minimum Evolution tree (Figure 2). Thus, it can be stated that Grus antigone is the outgroup of the ingroup Family Diomedeidae by being on separate branches.
The next tree created was the Maximum Parsimony tree (Figure 3). Like the Minimum Evolution tree, the Maximum Parsimony tree demonstrates that Grus antigone is the outgroup of the ingroup Family Diomedeidae by being on separate branches. The different homoplasy indexes recorded for the Maximum Parsimony tree were as follows: the consistency index (CI) is 0.663848, the retention index (RI) is 0.851679, and the rescaled consistency index is 0.565385 for all sites (with a RC value of 0.501764 for all sites and parsimony-informative sites).
The Maximum likelihood incorporates explicit models of sequence evolution and allows statistical tests of evolutionary hypotheses. This means that the likelihood of observing a given sequence dataset for a specific substitution model is maximized for each topology and the topology that give the highest maximum likelihood and the highest Log likelihood value of -3697.77 (Figure 4) is selected as the final tree.
The equality of evolutionary rate was determined between sequences A (U48950 Phoebastria nigripes) and B (AF 076091 Thalassarche carteri), with sequence C (FJ769854 Grus antigone) used as an outgroup in Tajima's relative rate test (Table 5). The 'clock test' using Tajima's relative rate test' had a P value of 0.05371, the null hypothesis of equal rates between lineages was not rejected .With a P value greater than 0.05, the molecular clock (Table 7) was imposed and a UPGMA (Unweighted-Pair Group Method with Arrhythmic means) tree was created (Figure 7). The evolutionary rate was set 0.02 (2% sequence divergence per million years), as specified for Cytochrome b (Figure 7).
Grus antigone and Diomedeidae species last shared a common ancestor 2.5735 million years ago (Figure 7). The second molecular clock test using the Likelihood test for rate of heterogeneity was rejected; therefore the molecular clock could not be imposed.
Figure 2: Minimum Evolution tree. Through the use of the Minimum Evolution tree evolutionary history of the can be inferred. The Minimum Evolution tree was obtained using 1008 nucleotides corresponding to the cytochrome b gene region. The Minimum Evolution algorithm with a the Tamura-Nei model of sequence evolution was used to infer the phylogeny (the Tamura-Nei model being the identical model to that of the GTR+G model), with the nodal supports being assessed following 10 000 parametric bootstrap replications. The ME tree was searched using the Close-Neighbor-Interchange (CNI) algorithm at a search level of 0. The Neighbor-joining algorithm was used to generate the initial tree. The analysis involved 25 nucleotide sequences with all ambiguous sequences being removed for each sequence pair. The optimum tree with the Sum of Branch Lengths (SBL) is 0.45873726. The tree was rooted with the outgroup Grus antigone and the ingroup Family Diomedeidae.
Figure 3: Maximum parsimony analysis of taxa. Through the use of the Maximum Parsimony tree evolutionary history of the can be inferred. The consensus Maximum Parsimony tree was obtained from the six most parsimonious trees using homologous dataset of 1008 nucleotides of the cytochrome b gene of the mitochondrion. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. The consistency index (CI) is 0.663848, the retention index (RI) is 0.851679 and the composite index (RC) is 0.565385 (with a RC value of 0.501764 for all sites and parsimony-informative sites). The Maximum Parsimony tree was obtained using the Close-Neighbor-Interchange algorithm with search level 1 in which the initial trees were obtained with the random addition of sequences (10 replicates). The analysis involved 25 nucleotide sequences with all ambiguous sequences being removed for each sequence pair.
Figure 4: Molecular Phylogenetic analysis by Maximum Likelihood method
Through the use of the Maximum Likelihood tree evolutionary history of the can be inferred based on the Dataset specific model. The consensus Maximum Likelihood tree was obtained using 1008 nucleotides corresponding to the cytochrome b gene region. The tree with the highest log likelihood (-3734.5355) is shown above. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.5341)).
Figure 7: Unweighted-Pair Group Method with Arrhythmic means (UPGMA) tree with native distribution of the members of Family Diomedeidae
The molecular clock test was performed by comparing the Maximum Likelihood value for the given topology with and without the molecular clock constraints. Differences in evolutionary rates among sites were modeled using a discrete Gamma (G) distribution. The null hypothesis of equal evolutionary rate throughout the tree was not rejected at a 5% significance level (P <0.05). The 'clock test' was performed using Tajima's relative rate test' had a P value of 0.05371. The evolutionary rate was set 0.02 (2% sequence divergence per million years), as specified for Cytochrome b.
The data set of 24 species of Diomedeidae and the outgroup Grus antigone contains 279 variable sites. Therefore, 27.7% of the entire cytochrome b gene (Nunn et al, 1996) is variable, indicating that genetic diversity is low in this group. The accuracy of tree constructions decreases with a proportional increase in variable sites (Grievink et al. 2010). Furthermore, as genetic variation increase, in other words an increase in variable sites, there is an increase in the rate of evolutionary change (Sterns and Hoekstra, 2005). It is understood that as the genetic diversity of a species increases, so does the stability of the species (Bininda-Emonds et al., 1999). 195 of the 279 variable sites were parsimony-informative, this means that 195 of the variable sites have undergone mutational changes. A tree constructed with 195 parsimonious informative sites will be the most parsimonious tree with the least number of changes.
The uncorrected (p-distance) pairwise comparison is 7.8%, indicating that the highest and lowest sequence divergence value between any two taxa is low. This indicates that members in the family Diomedeidae are closely related (Kholodova, 2009). The best fit model of sequence evolution selected under AIC (Akaike Information Criterion = 7468.845) for the dataset was the GTR (General Time Reversible) + G (Gamma distribution models) in MEGA 5 (Tamura et al., 2011). The GTR+G model of sequence evolution was selected because of the unequal base-pair frequencies, the average base frequencies were estimated to be: A= 27.8, T=25.3, C=34.1 and G= 12.8. This indicates that there is a base bias towards AT. The estimated value of transition/transversion bias (R) is 5.38, indicating a bias in transition compared to transversion (Tamura et al., 2011). The Gamma distribution value was 0.21 and evolutionarily invariable (+I) was not applicable.
A bootstrap analysis with 10 000 replications was performed on the 'Neighbor-joining' tree and the 'Minimum evolutionary' tree. This method of sampling with replacement sites indicated that there is a high estimate of confidence in both trees. Minimum evolutionary Heuristic Method used was the Close-Neighbor-Interchange (CNI). The maximum parsimony analysis produced six equally most parsimonious trees, a consensus tree was obtained by calculating a consensus with a 'cut-off' value of 50%. In order to determine if there is homoplasy, the consistency index needs to be calculated, if CI is not equal 1 there is homoplasy and as the CI value decrease the homoplasy increases. The consistency index [CI] [excluding uninformative characters] is 0.663848 and the retention index [RI] is 0.851679 (Nunn and Stanley). This indicates that there is homoplasy and the RI indicates that the maximum possible homoplasy. The rescaled consistency index (RCI) is 0.565385 for complete homoplasious characters in all the sites. Maximum parsimony Search Method used was the Close-Neighbor-Interchange (CNI) on Random Trees. Through the use of the Maximum parsimony it is also seen that the tree is partly resolved with Diomedea exulans, Thalassarche steadi and Thalassarche cauta cauta sharing a soft polytomy.
The final 'Maximum Likelihood' tree or tree with the most probable evolutionary outcome selected was the tree with the highest maximum log-likelihood which was -3697.77. Tajima's relative rate test was performed to determine the equality of evolutionary rate between sequences A (U48950 Phoebastria nigripes) and B (AF 076091 Thalassarche carteri), with sequence C (FJ769854 Grus antigone) used as an outgroup in Tajima's relative rate test. The Ï‡2 test statistic was 3.72 (P = 0.05371). Due to the P-value being higher than 0.05 the null hypothesis of equal rates between lineages is not rejected. It was found that 819 sequences were identical in all 3 sites, it was also found that 7 sequences were divergent in all 3 sites. It was further found that Phoebastria nigripes 39 unique differences, Thalassarche carteri had 58 unique differences and lastly Grus antigone has 85 unique differences.
As the first molecular clock test was not rejected a UPGMA tree could be inferred. Through the use of this tree it can be seen that Grus antigone and the Family Diomedeidae shared common ancestor 2.5735 million years ago. In the UPGMA tree it is seen that the terminal nodes of Phoebastria albatrus, Phoebastria irrorata, and Phoebastria immutabilis and Phoebastria nigripes share a soft polytomy, indicating that with regard to divergence time the tree is partly resolved.
All species of family Diomedeidae range from 'near threatened' to 'critically endangered'. By incorporating both the distribution and the UPGMA tree one can broadly make the deduction that as the species' native range distribution decreases their red list category and criteria increases with concern. This can be seen in critically endangered species such as Diomedea dabbenena (located only in Argentina, Brazil, Namibia, Saint Helena, South Africa and Uruguay), Diomedea amsterdamensis (Amsterdam Island) and Phoebastria irrorata (Chile, Colombia, Ecuador (Galápagos) and Peru). Thus species which are don't have a broad distribution have a greater probability of going extinct due to an increased likelihood of geographical isolation or a decrease in gene flow in the population will accumulate genetic, behavioral or morphological difference. It is stated that species range sizes appear to expand relatively rapidly post speciation, followed by a period of gradually range expansion and then a decline in range as species age (Webb and Gaston, 2000). It can be noted from the UPGMA tree, that more often than not, species which diverged long ago are of an increased concern in the 'IUCN red list of threatened species'. However, this is not a concrete correlation due to the many anthropogenic (i.e. fisheries, habitat intrusion etc.) and global factors (i.e. global warming, nest and egg predation etc.) influencing these species.
Thus in conclusion, there is a correlation between the geographical distribution and phylogenetic relationships of Family Diomedeidae, however, one cannot easily determine a correlation between the 'IUCN red list of threatened species' and geographical distribution and phylogenetic relationship.