Genetic Diversity And Population Genetics Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Cystic echinococcosis, an infection with the matacestode of the dog tapeworm Echinococcous granSulosus is a global public health problem that infects the human and ungulate animals[1]. The life cycle of E granulosus involves dogs and other carnivores as definitive host and livestock as intermediate host [2]. Eggs sheds in the feces of definitive host are ingested by intermediate host where they develop into metacestode stage and establish hydatid cysts. The infection in livestock is usually asymptomatic and detected during post mortem examination at the slaughter houses, yet it causes economic loss through condemnation of infected organ[3].

The E. granulosus is a complex of distinct strains with different host affinities. To date, 10 different genotypes have been described by molecular genetic analysis and the genotypic variation closely follows the biological and phenotypic characteristic of the parasite. It has been proposed that E. granulosus genotype should be split into different species: E. granulosus sensu stricto (genotypes G1-G3), E. equinus (genotype G4), E. ortleppi (genotype G5), E. canadensis (genotype G6-G10) and E. felidis (lion strain)[4,5,6]. The strain variation of E. granulosus reflects the differences in life cycle pattern and host range; thus the knowledge of genetic diversity and population genetics of this parasite is of immense public health importance. The mitochondrial DNA proves to be useful in differentiation all the genotypes of E. granulosus and acts as an important genetic marker to study the population genetic structure of this parasite as it is haploid, non recombining, multicopy, rapidly evolving and maternally inherited ([7].

In Western India, 4 different genotypes (G1, G2, G3 and G5) were reported in different intermediate hosts such as cattle, buffaloes, pigs and sheep [8]. In West Bengal, G1, G2 and G3 genotypes were found to infect the livestock. Barring the reports published by Singh et al., 2012 [9]from Ludhiana (North India) who found the G1 and G3 genotype in 10 isolates of E. granulosus, an extensive study on the genotypes of parasite on the large number of isolates, covering large geographical endemic areas is lacking.

The aim of present study was to genotype the North Indian animal isolates of E granulosus by sequencing of mitochondrial cox gene. The results were further compared with nucleotide sequences of this parasite from other geographical regions to study the genetic variability and population genetics of this parasite.

Materials and Methods

Sample collection

Hydatid cysts were collected during the period from 2009-2012 from 4 different geographical areas in North India. Cyst samples were obtained from 74 freshly slaughtered and heavily infected sheep from slaughter houses located in Chandigarh (n=66), Shimla, Himachal Pradesh (n= 3) and Srinagar, Jammu and Kashmir (n=5). Seven hydatid cysts were kindly received from Guru Angad Dev University, Ludhiana, Punjab, North India. Thus isolates from 81 animals were analysed. The cysts were removed from the carcass and transported to the department of Parasitology, PGIMER under refrigerated conditions for further processing. The intact hydatid cysts were separated and washed with distilled water. Cyst from each animal was considered as an isolate.

Molecular analysis

The cyst samples were washed thrice in PBS to remove ethanol and genomic DNA was extracted from each sample by QIAamp DNA mini kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions.

For molecular identification, PCR amplification of the cox1 gene was performed by using primers and PCR conditions as described previously [10] with minor changes. Briefly, amplification was performed in 50 μl final volume containing 2 μl DNA, 0.2 mM premixed solution of dNTP, 10 pmol of each primer, 1x PCR buffer, and 1 U of TaqDNA polymerase. Amplification program included an initial denaturation step of 95ËšC for 5 min and 38 cycles each of denaturation (95ËšC for 50s), annealing (57ËšC for 50s), extension (72ËšC for 1 min) and final extension of 72ËšC for 10 min. After agarose gel electrophoresis (1.5%), PCR products were purified and sequenced.

Phylogenetic analysis

Different sequence of E. granuolosus sensu stricto populations deposited in the GenBank from India and other South Asian, East Asian, Europian, Middle East, African and South American countries (China, Nepal) were retrieved from the National Center for Biotechnology ( and compared with sequences of isolates used in the present study (Table 1). Nucleotide sequence analysis was performed with BLAST sequence algorithms and sequences were aligned using ClustalW [11]. The genetic distance was calculated by using Kimura two-parameter distance estimates and samples were clustered using the PhyML[12] as part of SeaView v. 4.2.4 [13]


Gene genealogies

The identification of haplotypes and their networks was constructed based on parsimony criteria [14] using the TCS version 1.2 software [15]. The network estimation was run at a 95% probability limit. This haplotype network analysis is useful for intraspecific data in revealing multiple connections between haplotypes and indicating possible missing mutational connections.

Population genetic analysis

For population genetic analysis these sequences were grouped in 7 populations: South Asia, East Asia, Middle East, Europe, Africa and South America. Population diversity index such as numbers of segregating sites (S), haplotypes number (h), haplotype diversity and nucleotide diversity and average number of pairwise nucleotide differences within population (K),) were estimated using DnaSP 4.5 Software [16]. The neutrality indices of Tajima's D [17] and Fu's Fs [18] in each population were calculated by population genetics package Arlequin 3.1 [19].

. The pairwise genetic difference was estimated for all populations by calculating Wright's F-statistics (Fst) based on gene flow (Nm). In addition, average number of pairwise nucleotide differences (Kxy), nucleotide substitution per site (Dxy), and net nucleotide substitution per site (Da) between populations were also calculated by DnaSP.


The amplification of cox1gene with JB3/JB4.5 primer yielded PCR product of 446 bp. Nucleotide sequence of all the 81 isolates from North India were aligned with reference sequence of each genotype within E. granulosus retrieved from Genbank. Total 3 genotypes of E. granulosus were found: buffalo strain (G3 genotype n=58), sheep strain (G1 genotype n=22) and Tasmania sheep strain (G2 genotype n=1).The sequences of the haplotypes found in this study were deposited in Genbank with accession numbers JX854022-34 and KC422644-45. Sequences of these isolates along with those retrieved from gene bank were used to construct a phylogenic tree (Fig 1). Total 73 sequence variants (named as Hap 1-Hap 73) were grouped in two main clades. Clade I comprises G1 genotype and its microcvarients, and Clade II comprises Genototype G3 and its microvarients. Sequence variant Hap 49 served as connective link between these two clades.

Gene/allele genealogy

The genealogic relationships among the cox1 sequences estimated by TCS software detected two lineages. The first lineage clustered South Asian(12.62%n=13), Middle East, (45.26%n=43)European (39.21%n=28),South American (54.9%n=28), East Asia (49.05%n=26), Africa (42.10%n=4) and second lineage clustered South Asian(56.3%n=58), Middle East (11.5%n=11), European (11.76%n=6),South American (9.80%n=5), East Asia (1.88%n=1), Africa (5.2%n=1) and Australia (20%n=1). Thus the haplotype in both lineages shared wide geographical distribution and the haplotypes in first lineage is reported predominantly in Middle East, European, South American population, whereas the haplotype in second lineage was predominant in South Asian population.

Nucleotide polymorphism

Total 73 haplotypes were found in 376 sequences: 20 in South Asia, 14 in East Asia, 17 in Europe, 25 in Middle East, 10 in Africa, 13 in South America, 5 in Australia. Along the 341 bp reference alignment, only nucleotide substitutions were detected and insertions or deletions were not detected 11point mutations were noted and 23 were parsimony informative sites.

Diversity indexes

Population genetic indices were calculated using the nucleotide data of Cox1 gene from India and its neighboring countries (Table 2). The haplotype diversity (Hd) for all 376 sequences was calculated to be 0.803 +/-.0.016 SD. Average number of nucleotide differences, k was found to be 1.82761 and nucleotide diversity (π) was 0.00536 +/-.0.00023. The haplotype and nucleotide diversity indices were highest in Australian population followed by African population and lowest in South Asian populations. Neutrality Indices calculated by Tajima's D and Fu'sFs test were negative in all populations, the D value was significantly negative in South Asian, Europe and South American populations whereas except African and Australian population Fs value was significantly negative in other 5 poplulation.

Inter-population nucleotide differences (Kxy) and average number of nucleotide substitutions per site between all these populations (Dxy) varied from 1.36 and 0.00399 (East Asia and South America) to 2.8 and 0.00821(Africa and Australia) respectively(Table 3). Pairwise genetic distance (Fst) in these populations varies from - 0.00206 with Nm value=infinite (between Europe and Middle East ) to 0.37828, Nm= 0.82176 (between South Asia and South America)(Table 4). The Fst value between Europe and Middle East and Gst between Europe and South America were found to be negative, indicating no differentiation at these loci [20].When population of South Asia was compared with other populations the value of Fst range from 0.10273-0.37828 with Nm value range 0.82176 - 4.36727 indicating these populations are differentiated with low gene flow. Middle East countries in comparison to other countries show very low genetic differentiation (Gst 0.00209-0.10137, Fst -0.002060-0.24707) with very high gene flow (Nm 1.52370- infinite). Population from Middle East and Europe shows a negative value of Fst with infinite value of Nm which indicates that populations in these countries behave as one population with very high degree of gene flow. Further except between Europe and Middle East, South Asia and Australia, South America and Australia all other population showed significant pairwise genetic distance.


Pednekar et al. [8] from Eastern India have reported four genotypes of E. granulosus namely the sheep strain (G1), Tasmanian sheep strain (G2), Indian buffalo strain (G3) and cattle strain (G5) of E. granulosus in livestock in Maharashtra and adjoining areas in Western India. The predominant genotype was found to be G3 genotype (63%) present in all species of livestock followed by the G5 (19.56%), the G1 (13%) and the G2 genotype (4.34%). In Ludhiana (North India), only 2 genotypes, buffalo strain (G3) and common sheep strain (G1) were found to infect the livestock [9,21]

In the present study, 3 genotypes of E. granulosus were found to infect the livestock: buffalo strain (G3 genotype), sheep strain (G1 genotype) and Tasmanian strain (G2 genotype). In concordant to the earlier studies from livestock (cattle, buffalo, pig and sheep) in India, the buffalo strain (71.8%) was found as predominant genotype. The second most common genotype was the sheep strain found in 27.16 % isolates, the G2 genotype was found in only one isolate from Srinagar, Kashmir (North India) which was similar to the finding in Eastern India [8]. Further, in contrast to the results of the present study, G1 was reported as dominant genotype in other countries: for example 95.74% in China [22], 87.5% in Iran [23], 77.4% in Southern Brazil with (11.11% of G3 genotype) [24], 71.59% in Italy (with 27.8% prevalence of G3 genotype) [25], 66% in Tarkey ([26], 55.8 % in Pakistan (with 44.11% prevalence of G3) [27]. These data suggest that while moving from Middle East to Europe, South America and South Asia prevalence of the G3 genotype start increasing and this genotype emerge as predominant in South Asia and then in East Asia the G1genotype again emerge as predominant genotype.

To date, very few studies have explored in depth the population genetic structure of E. granulosus [28,29,30,31]. These studies have shown cox 1 gene as a promising candidate for revealing the population genetics of E granulosus. In the present study, E. granulosus sensu stricto populations from wide geographical areas were analysed to examine the parasite genetic diversity. For this sequence of only E. granulosus sensu stricto complex were retrieved from GenBank/EMBL/DDBJ international Databases as there is scarcisity of data in Genbank for other genotypes.

Despite the wide distributional range, the estimation of inter-population comparison (Kxy, Dxy, Gst and Fst) also support low-moderate level of genetic differentiation between these populations. The populations of EU, ME and SAM showed low divergence and share the most common haplotypes. The EU and ME populations are highly closely related to each other which is suggested by a very low and non-significant FST value. Gene flow (Nm) was also found to be very high (Table 2). The SA population is most differentiated with very low gene flow among other population. This result could be related to the presence of G3 as predominant genotype in this population.

Inspite of high haplotype diversity, low nucleotide diversity values suggest small differences between haplotypes. This is also demonstarted by the haplotype network, which represents mostly single nucleotide differences between majorities of haplotypes (Figure 2). The combination of high haplotype and low nucleotide diversity, as observed in the present study, can be a signature of a rapid population expansion from a small effective population size [2]. A number of statistical tests have been developed to test selective neutrality of nucleotide variability and they are used to determine the such population growth [32]. These tests are based on distribution of pairwise differences between nucleotide sequences within populations. In this study, we have used two tests that are usually used to find out the population expansion and differ slightly in their approach. Tajima's D test [17] is based on the comparasion on the allelic frequency of segregating nucleotide sites. A positive value of this test indicates a bias towards intermediate frequency alleles, negative value indicates a bias towards excess of the number of rare alleles and the latter being a signature of recent population expansion. Fu's FS test [18] is based on the alleles or haplotypes distribution, and here too negative values can indicate an excess number of alleles, as would be expected from a recent population expansion or from genetic hitchhiking. In this study, Tajima's D test was negative for all populations, however, only three populations South America, Europe and South America differ significantly from neutrality. Fu's FS test resulted in significant negative values for all populations except Africa and South America which were negative but not significant (Table 3). The overall negative values of both neutrality tests indicate excess of rare mutations in the populations, which can imply recent population expansion. Further analysis by including the additional neutral nuclear DNA markers could provide a more complete perspective on population genetic structure of the populations.

The Interpretation of demographic expansion correlates well to the widely observed patterns of domestication of sheep which started around 12,500 B.C. The various genetic and archaeological evidence suggest that domestication of sheep occurred first in Southwest Asia (Middle East) and then spread successfully into Europe and Africa, and the rest of Asia [33].Initially 70 sheep brought to Australia from the Cape of Good Hope  in 1788 and the next shipment was of 30 sheep from Calcutta and Ireland in 1793.[34] .The result of present study have suggested that parasite along with its intermediate host was introduced into Europe and Africa from middle east and then to South America, Australia and other parts of Asia. Recently, a similar hypothesis regarding dispersal of parasite was proposed in Europian, South American and Middle East populations[30,31]

In the present study, haplotype network have shown that all the haplotypes of E. granulosus sensu stricto appears to have been distended from a common ancestral haplotype (Hap1) of G1 genotypes which is widely distributed in different geographical area. Interestingly, the nucleotide sequence of this haplotype (Hep1) was 100% identical to previously described as predominant haplotype in Europe (EG1: JF513058) ([30], China and Peru (G01: AB491414)[29], Iran and Jorden (EG01)([31]. The Haplotype (Hap11) which was found as the second dominant genotypes in Middle East, China, Europe and South America is predominant is South Asia. In dendrogram analysis, Hap 49 appear to be connective link between G1 and G3 genotype, similar findings were found in haplotype network where G3 genotype and its microvarients were appeared to originated from EU11 (Hap 49) haplotype.

In conclusion, the present study reveals a high genetic diversity within populations of E. granulosus, but relatively low to moderate genetic differentiation among the populations. Low genetic differentiation has also been reported in different population of Taenia solium [34]. The observed patterns of genetic diversity within and between the populations are likely caused by population expansion after the introduction of founder haplotype. Finally, we support that it is important to connect molecular epidemiology with evolutionary biology so that population genetics and phylogenetic analyses are able to confer a considerable added value in the characterization of strains and species of pathogens [35].