Microsatellite Analysis Of Chloroquine Resistance Associated Alleles Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Chloroquine resistance (CQR) in Plasmodium falciparum is associated with point mutations in Plasmodium falciparum chloroquine resistant transporter gene (pfcrt) in chromosome 7. The geographic origin of chloroquine resistance is associated with amino acid variations at position 72-76 and microsatellite haplotypes flanking this gene. Mutant SVMNT haplotype with PNG origin was reported to spread in many localities of India probably through SEA. In order to understand the gene flow of the SVMNT haplotype across India, seven microsatellite loci which are under chloroquine selection and three neutral polymorphic microsatellite loci were used. Further, the impact of transmission intensity on the genetic structure among 13 populations across India was also tested. The genetic admixture analysis and F-statistics detected genetically distinct groups in accordance to transmission intensity and the probable use of chloroquine. The distinct geographical groups identified ; Northeastern-Eastern-Island group (Assam, West Bengal, Orissa, Jharkhand, Chhattisgarh and Car Nicobar island), the Northwestern group (Uttar Pradesh, Rajasthan and Gujarat), the Central group (Madhya Pradesh and Maharashtra), the Southwestern group (Goa and Tamil Nadu). A large genetic break between the northeastern-eastern group and southwestern group was observed. A pattern of significant isolation by distance was observed in low transmission areas (P = 0.003, r = 0.49, N = 83, Mantel test), indicating restricted gene flow among the populations in accordance with geographical distances. The data also indicated gene flow within all the four groups. A significant genetic differentiation was observed between P. falciparum populations at high and low transmission areas and they have not differentiated substantially by genetic drift. Overall, the study suggests that transmission intensity can be an efficient driver for genetic differentiation in both neutral and adaptive loci at the geographic scale of India.


Positive natural selection contributes to the fixation of beneficial alleles in a population. During this evolutionary process, it is not uncommon to observe a parallel increase in the frequency of specific neutral alleles, which are linked to these beneficial alleles by virtue of proximity. This carriage of neutral alleles during positive natural selection is called genetic hitchhiking (Smith, Haigh, 1974). Genetic hitchhiking typically results in a reduction of neutral variation in the population through a phenomenon called a selective sweep. The pattern of genetic hitchhiking largely depends on two factors: (i) the frequency of random mating within a population and (ii) the presence of geographic structure (genetically isolated by distance) in the population. The geographic structure of a population might simply delay the propagation of beneficial mutations by limited migration of parasites across different geographic space. Perhaps, the spread of beneficial mutations over a certain geographic distance results in weaker hitchhiking (Barton, 2000).

In Plasmodium falciparum, several genetic studies have investigated genetic hitchhiking around mutated genes thought to confer resistance to antimalarials like pyrimethamine (Nair et al., 2003; Roper et al., 2004), sulphadoxine (Alam et al., 2011; Vinayak et al., 2010) and chloroquine (Chen et al., 2005; Mehlotra et al., 2005; Wootton et al., 2002) . A few other studies have also established that geographic structure might impact the extent of a selective sweep associated with this drug resistance (McCollum et al., 2007; Mu et al., 2005; Pearce et al., 2005). For chloroquine resistance, these studies have found: (i) multiple foci for the origin of chloroquine resistance, one in Asia spreading to Africa, two in South America, one in Papua New Guinea (PNG), and one in the Philippines; (ii) Southeast Asia and Africa share the same mutant allele CVIET (amino acids from 72 to 76 with mutations underlined) in Plasmodium falciparum chloroquine resistant transporter gene (pfcrt), suggesting the role of human migration in conjunction with the spread of chloroquine resistance; and (iii) mutant SVMNT haplotype characterizes the South American type, which is prevalent throughout South America and the Pacific region. More recent studies from India have examined the genetic variation within microsatellite loci flanking pfcrt gene and reported the invasion of the PNG type (SVMNT) throughout India (Lumb et al., 2012; Mixson-Hayden et al., 2010). Expansion of the same mutant type into Pakistan has also been reported (Rawasia et al., 2012). However, studies connecting this genetic footprint with the existing population structure in India have not yet been made. In absence of any knowledge of population structure of P. falciparum in India, the gene flow of the SVMNT mutant across India has not yet been explored, nor has the spread of neutral alleles around the pfcrt gene across India been investigated. Understanding the patterns of a selective sweep in a population may reveal a hidden geographic structure within the population (Kim, Maruki, 2011).

Analyzing the genetic variation within neutral microsatellite loci provides an effective means to determine parasite population structure, can help to begin unraveling any potential links between the epidemiological pattern and genetic structure of P. falciparum (Abdel-Muhsin et al., 2004; Anderson et al., 2000). Studies also inferred that P. falciparum populations are poorly described as a single random mating population model and it is likely to observe distinct clusters of population (Anderson et al., 2000; Mu et al., 2005). Malaria exhibits a great epidemiological diversity in India, in terms of pathogen prevalence and distribution of Plasmodium species, transmission intensity, vector distribution, and ecology of vector breeding sites. The presence of local parasite population structure has been suggested to be a factor of exposure-related immunity, identifiable through differing seroprevalance frequencies to P.falciparum surface antigens, even among two closely spaced localities (Biswas et al., 2008), in contradictory identical genetic structure has been reported between two distant localities i.e, eastern and northeastern India (Joshi et al., 2007). Previous work has shown an association between the genetic diversity within the pfcrt gene and the various transmission intensity of India (Mallick et al., 2012). Moreover, there is limited information available on how the genetic diversity of P. falciparum is structured within the Indian population and it has yet to be established whether or not variability within malaria epidemiological factors across different geographic locations provides a basis for the evolutionary plasticity and local adaptation. However, these diverse epidemiological conditions and limited genetic studies involving surface antigens and drug resistance are impetus/stimulating to suspect a population structure in India.

This study aims to assess the genetic diversity and population structure of P. falciparum subpopulations, which have, over time, sustained different levels of chloroquine selection across India. Testing the correlation between transmission intensity and population structure, and analyzing the footprints of selection on a genomic level, may facilitate a deeper understanding of how drug utilization shapes the structure of P. falciparum in India. This study uses putative neutral microsatellite loci and pfcrt-flanking microsatellite loci assumed to be under chloroquine selection. Population structure analyses and hierarchical F-statistics revealed substantial geographical structure of P. falciparum populations within India and with strong parasite genetic differentiation between high transmission areas (HTA) and low transmission areas (LTA). Here, ultimately the spread of hitchhiking associated with chloroquine resistance across India is highlighted.

Materials and methods

Ethical Considerations

The Ethics Committee of National Institute of Malaria Research (NIMR) approved this study protocol. All individuals or the parents/guardians of children gave written informed consent before inclusion in the study.

Selection of genomic DNA

Genomic DNA samples (N = 287) isolated from P.falciparum were selected from an earlier study, which focused on sequencing pfcrt (Mallick et al., 2012). Additionally, genomic DNA was isolated from blood samples collected in Car Nicobar (CN) (N = 12) and Madhya Pradesh (MP) (N = 17) during years 2002 and 2005, respectively. DNA was extracted from these 29 samples using Qiagen® Blood Mini Kit (Hilden, Germany), successfully amplified by PCR, and sequenced to determine pfcrt haplotypes. All samples (N = 316) were then screened for multi-clonal infections by genotyping the single-copy antigenic genes merozoite surface protein-1 (msp-1) and msp-2 as described earlier (Snounou et al., 1999). From these genotyping methods, 213 single-clone isolates were identified. To ensure that all isolates were single-clones, the 213 isolates underwent microsatellite genotyping at 3 loci (TAA60, TAA80, and ARA2) as described earlier (Anderson et al., 1999). Samples having more than one allele at any of the 3 loci were excluded from further investigation. Finally, a total of 188 single-clone P. falciparum isolates were selected from the following locations in the India: Assam (ASM) (N = 13), Orissa (ORS) (N = 25), West Bengal (WBG) (N = 13), Jharkhand (JHK) (N = 17), Chhattisgarh (CHG) (N = 12), Car Nicobar (CN) (N = 7), Maharashtra (MAH) (N = 9), Tamil Nadu (TN) (N = 18), Goa (GOA) (N = 15), Gujarat (GUJ) (N = 15), Rajasthan (RAJ) (N = 21), Uttar Pradesh (UP) (N = 14), Madhya Pradesh (MP) (N = 9). All isolates were collected from microscopically positive P. falciparum patients between 2002 and 2006. Eight laboratory DNA isolates obtained from MR4 were included to represent varied geographic origins.

Microsatellite typing

Seven Microsatellite loci flanking the pfcrt gene, B5M97 (-24 kb), B5M77 (-20 kb), 2E10 (-5 kb), 9B12 (+1 kb), PE12A (+6 kb), 2H4 (+22 kb), and PE14F (+106 kb), were amplified as previously described (Wootton et al., 2002). All primary and nested PCR's were performed in a volume of 20 ml with 0.75 U of Taq DNA polymerase, 2.5 mM magnesium chloride, 0.25 mM each dNTP s and 0.1 mM of each primers. The primary PCR product was diluted 1:100 for the following nested PCR. For all nested PCR reactions the 5' end of the primers were modified with one of the following fluorophores: FAM, HEX or TET. Following PCR amplification the length variation was measured commercially at The Centre for Genomic Application, New Delhi, India. Genotypes were analyzed using ABI Peak Scanner v1.0 software. To check on scoring error two reference strains were amplified and used to calibrate allele sizes.

Microsatellite analysis

Genetic Diversity

The genetic variation in terms of expected heterozygosity (He) for each microsatellite locus was calculated using formula He = [n/ (n-1)] [1-∑pi2], where n is number of P. falciparum isolates genotyped for a particular locus and pi is the frequency of the ith allele. The sampling variance of He was calculated as [2(n-1)/n3] (2(n-2) [∑pi3 - (∑pi2)2]) (Nash et al., 2005). Significant differences between the mean He of any two groups was assessed using a Mann-Whitney test in GraphPad Prism 5 (GraphPad Software, Inc). Allele diversity was measured in terms of the mean number of distinct alleles (allelic richness) and the mean number of private alleles (private allelic richness) using the ADZE program (Szpiech et al., 2008). The program uses a rarefaction method (Kalinowski, 2004) to correct the sample size factor across whole population.

Linkage Disequilibrium

LIAN version 3.5 (Haubold, Hudson, 2000) was used to analyze the evidence of multilocus linkage disequilibrium (LD) in neutral and pfcrt-flanking microsatellite loci of various pfcrt haplotypes (CVIET and SVMNT). The standardized index of association (IAS) was calculated for the entire multilocus haplotype and also within each unique multilocus haplotype (obtained from Arlequin software) to account for any bias resulting from the presence of dominant haplotypes. The null hypothesis of complete linkage equilibrium (IAS = 0) was tested by Monte-Carlo simulations using 10,000 random permutations of the data. Due to the small sample size of some geographic regions, the LD was calculated by considering various pfcrt haplotypes, rather than by location.

Population structure

STRUCTURE 2.3.1 software (Pritchard et al., 2000) was used to assign isolates to subpopulations (K) according to allele frequencies at each locus. Here, the software was used to test how microsatellite haplotypes clustered according to the geographic origins of isolates. For this admixture model, the burn-in period was set at 50,000 steps, followed by 100,000 Markov chain iterations. Five independent runs were performed for each predefined K-value (K = 1-8). These analyses were done for two different combinations of microsatellite loci; (1) in the neutral loci TAA60, TAA80, ARA2 and (2) in pfcrt-flanking loci B5M97, B5M77, 2E10, 9B12, PE12A, 2H4, PE14F. STRUCTURE software assigns probabilities of relatedness for each individual haplotype (or sample) into genetically distinct clusters; these probabilities are then used to infer the membership to specific clusters (or subpopulations) (Rosenberg et al., 2003).

Population Differentiation

The degree of population differentiation among all regions was measured by pairwise FST values (Weir B, 1984). The FST values and its significance was calculated using 10,000 permutation tests in Arlequin software v3.0 (Excoffier et al., 2005). Nm values calculated as Nm = 1/2[(1/FST) - 1], were estimated to account for the degree of migration between the subpopulations. An analysis of molecular variance (AMOVA) (Excoffier et al., 1992) was used to determine statistical differences between the clusters obtained by STRUCTURE and also in different geographic region (Northern, Northeastern, Eastern, Western, Central, Southern and Island). A Mantel test (Mantel, 1967) was performed to asses the correlation between geographical and genetic distances (isolation by distance-pattern) of all populations, using Arlequin software(Excoffier et al., 2005).

Phylogenetic relationship

A median joining network was constructed using unique multilocus haplotypes at pfcrt-flanking microsatellite loci in NETWORK version (<http://www.Fluxus

engineering.com/sharenet.htm>), to illustrate the genetic relationship between the various pfcrt haplotypes found among different regions of India.


Of the 188 single clone isolates, 169 isolates successfully amplified for all seven microsatellite loci flanking pfcrt gene. Based on the prevalence of falciparum malaria and the transmission intensity, the study sites were classified into two groups; high transmission areas (HTA) and low transmission areas (LTA).

Allele diversity in microsatellite markers

All microsatellite loci studied were polymorphic and the number of alleles per locus ranged from 5 (at PE12a) to 20 (at 2H4). Genetic diversity was computed using two sets of loci; one set consisting of 3 putatively neutral loci and the other set of adaptive loci consisting of seven pfcrt-flanking microsatellite loci thought to be under chloroquine selective pressure. The level of allelic diversity was extensive between geographic locations (Table 1). The mean number of alleles per locus (NA) at neutral loci indicated higher diversity in HTA, ranging from 3.7 to 6.0, in comparison with LTA, which ranged from 2.7 to 4.7 alleles per locus. A similar trend was observed in the pfcrt-flanking microsatellite loci, where HTA ranged of 2.4 to 4.6, in comparison with LTA, which ranged from 1.4 to 2.7 alleles per locus. There was a significant reduction (P = 0.0107, unpaired t-test) in mean number of alleles per locus at pfcrt-flanking microsatellite markers when compared to putative neutral loci.

The number of alleles per locus radically depended on sample size. To negate this potential bias, the sample size was standardized to the smallest sample size across populations (CN; N = 7) using ADZE software. The observed allelic richness "AR" (mean number of distinct alleles) and private allelic richness "PR" (number of alleles private to the population) are presented in Table 2. Even after standardizing the data, the results were consistent, indicating that HTA have higher AR than LTA in both neutral and pfcrt-flanking microsatellite loci. The only exception was MP (from LTA), which showed intermediate values for all measures of diversity. Higher allelic diversity was observed within the neutral loci, when compared to the flanking loci of the different pfcrt haplotypes. There was significant reduction (P < 0.0001, t-test) in allelic diversity amongst the resistant haplotypes in comparison to wild haplotypes.

Expected Heterozygosity and trend of selective sweeps around pfcrt haplotype

The extent of genetic diversity in terms of expected heterozygosity was calculated for each locus. However, patterns of a selective sweep were tested by only considering the pfcrt-flanking microsatellite loci. The mean number of distinct alleles (AR) and mean heterozygosity (He) for the wild pfcrt haplotype (CVMNK) were significantly higher (AR = 6.0 ± 0.54; He = 0.81 ± 0.07), compared to resistant haplotype CVIET (AR = 2.92±0.33; He = 0.53±0.06; P < 0.01, Mann-Whitney test) and resistant haplotype SVMNT (AR = 2.10±0.21; He = 0.32±0.04; P < 0.004, Mann-Whitney test). This observation is consistent with the fact that these resistant haplotypes are under drug pressure as indicated by the decline in values of AR and He. A reduction in allelic variation was observed among the microsatellite loci between -24 kb upstream to +22 kb downstream of the pfcrt gene. The "valley" of reduced variation for SVMNT haplotype was much deeper and broader than CVIET haplotype (Fig1). Further, a reduction in allelic variation was comparatively less extreme in HTA than in LTA. This reduction was especially pronounced between -24kb to +22kb in LTA, whereas the reduction observed in HTA was between -20 kb to +6 kb. The immediate proximal loci to mutant pfcrt gene showed modest variation in HTA. The reduction of expected heterozygosity in SVMNT haplotype (N=82) from LTA was He= 0.24 ± 0.04 in comparison to He= 0.45 ± 0.06 of SVMNT haplotype (N=40) from HTA. Regional distribution of expected heterozygosity (He) is presented in Table 3.

The effects of clonal expansion are expected to extend to a genomic level, rather than to a few loci under selection, which may show decreased variation. To investigate the possibility of clonal expansion in prevalence of pfcrt haplotypes, AR and He were measured within 3 neutral loci on chromosomes 5, 11, and 13. , The mean He was significantly higher for the 3 neutral loci, when compared to the 7 pfcrt-flanking loci (P < 0.009, Mann Whitney), in all geographic regions. The mean He at neutral loci was comparable between wild and resistant pfcrt haplotype (Table 3). Among the LTA, except in MP, the reduction in valley was both wider and more symmetrical around pfcrt gene for SVMNT. The reduction in genetic variation at both pfcrt-flanking and neutral loci observed in GOA, TN and UP could be an indication of clonal expansion.

Unique microsatellite haplotypes among pfcrt haplotypes

Analysis of unique flanking microsatellite haplotypes associated with pfcrt-mutant alleles and identifying these among different locations can provide inferences about the dispersal of chloroquine resistance (CQR) across India. Five pfcrt-flanking microsatellite loci (-24 kb, -20 kb, -5 kb, +1 kb and +6 kb) were used to construct the unique multilocus haplotypes. The two farthest microsatellite loci downstream 2H4 (+22kb) and PE14F (+106kb) were excluded due to high He, which may indicate non-selective properties. A total of 54 unique multilocus haplotypes were observed amongst all the pfcrt gene alleles (Table 4). Among these 54 multilocus haplotypes, 16 were associated with wild CVMNK (N=16), 18 with mutant CVIET (N=31), and 20 with mutant SVMNT (N=122). Two multilocus haplotypes were not restricted to a specific pfcrt haplotype: (i) CK3 was shared between wild CVMNK and resistant CVIET and (ii) haplotype CK17 was shared between wild CVMNK and resistant SVMNT. None of the multilocus haplotypes were shared between two resistant allele types.

ST1 was the predominant multilocus haplotype associated with the SVMNT and found in all surveyed regions. The other frequent haplotypes associated with SVMNT allele like ST2, ST16 were also shared between both transmission areas. The multilocus haplotypes associated with CVIET were restricted to HTA. The rest of multilocus haplotypes generally show regional bias in there occurrence and the frequencies of these multilocus haplotype were found to vary from region to region.

Linkage disequilibrium

Favorable mutations leading a genetic sweep generally give rise to a reduction in genetic variation and increase the linkage disequilibrium (LD) around the target of selection (Anderson, 2004). Thus, in order to have an assessment of selection, LD was measured as a standardized association index (IAS) between multiple loci flanking pfcrt haplotypes. There was no significant LD (IAS = ­0.0219, P = 0.687, Monte Carlo simulation) around wild CVMNK haplotype, whereas significant LD was observed around resistant haplotypes. The IAS values for CVIET and SVMNT were 0.13 (P < 0.0001, Monte Carlo simulation) and 0.14 (P < 0.0001, Monte Carlo simulation) respectively. Multilocus LD at 7 and 5 pfcrt-flanking microsatellite loci is illustrated in Table 5.

Population structure

The algorithm underlying STRUCTURE detects the uppermost level of population structure in terms of number of subpopulations (K) represented in the entire population. A K = 2 had the highest clusteredness value (k=0.58) over five simulations for the 3 neutral loci, while a K = 3 had highest clusteredness value (k=0.85) over five simulations for the 7 pfcrt-flanking loci.

The clustering pattern of the neutral microsatellite loci obtained with K = 2 (shown in Fig 2a) indicates discrete clustering of P. falciparum genotypes into high transmission and low transmission areas with limited admixture in both regions. The clusters of genotype were significantly associated with transmission intensity (high versus low) (P < 0.0001, Fisher exact) i.e Neutral cluster1 was associated with HTA and Neutral cluster2 associated with LTA. The population structure of P. falciparum according to ancestry at K = 2 is shown in Fig 2b. Neutral cluster1 (red) consists of 78 isolates, 70 (89.74%) of which were collected in HTA, while Neutral cluster2 (green) cluster consists of 91 isolates, 75 (82.41%) of which were collected in LTA.

The clustering pattern of the 7 pfcrt-flanking microsatellite loci shows a much higher level of admixture in HTA; while in LTA, Flanking cluster1 (red) of K = 3 is the predominant cluster, with a complete absence of Flanking cluster2 (green) of K = 3 (Fig 3a). Flanking cluster1 was significantly associated with LTA (P < 0.001, Fisher exact) and SVMNT haplotype (P < 0.001, Fisher exact). Flanking cluster2 was associated with CVIET haplotype (P < 0.001, Fisher exact) and observed only in Eastern, Northeastern India, and CN Island, which was not unexpected due to the proximity to South East Asia (SEA). However, Flanking cluster3 (blue) of K = 3 was significantly associated with the wild-type CVMNK, as well as with the HTA (P < 0.001, Fisher exact) (Fig 3b). Both the SVMNT and CVIET haplotypes appeared in Flanking cluster3, and seem to have evolved from the wild type through some historical event. By superimposing the two analyses, it is clear that Flanking cluster2 and 3 were significantly associated with Neutral cluster1 (P < 0.001, Fisher exact), while Flanking cluster1 was significantly associated with Neutral cluster2 (P < 0.001, Fisher exact).

Population differentiation

Using the three neutral loci, pairwise FST values were calculated among all subpopulations.

No population differentiation occurred between ORS-CN (FST = 0.01, P = 0.363) or between ASM-MAH (FST = 0.05, P =0.069). Population differentiation occurred between ASM-CHG (FST = 0.07, P < 0.04), and up to FST = 0.44 between CHG-TN (P < 0.0001) (Table 6). It seems ASM has a limited genetic similarity to CHG, while CHG and TN were most genetically separated. When pairwise FST values were calculated with seven pfcrt-flanking loci it ranged from 0.06 between RAJ-TN (P < 0.03), and up to 0.46 between CN-GUJ/TN (P < 0.0001)(Table 6), indicating RAJ population has limited genetic similarity to TN and while CN-GUJ/TN has strongest genetic differentiation between them. While, no population differentiation occurred between GOA-TN (FST = 0.02, P = 0.241) or between ORS-CHG (FST = 0.06, P = 0.067).

Nm values (Slatkin, 1987) determined to estimate the gene flow at neutral loci between different geographic regions to indicate the level of migration between parasite population. The Nm values were fairly erratic for non-significant FST values (i.e when P ≥ 0.05), albeit the overall values indicate high gene flow between populations. The Nm values for significant FST values at neutral loci ranged from 0.64 between CHG-TN to Nm = 6.45 between ASM and CHG, indicating low migration between CHG-TN and high migration between ASM and CHG. However, Nm estimates for pfcrt-flanking microsatellite loci indicated the flow of CQR between the subpopulation. Migration of pfcrt-flanking loci was shown to be highest between RAJ-TN (Nm = 7.54) and the lowest was between CN-GUJ/TN (Nm=0.59). The Nm values at neutral and pfcrt-flanking loci are illustrated in Table 7.

Interpreting FST and Nm values

The significant pairwise FST and Nm values were interpreted to generate a pattern of gene flow in relation to CQR across India. Pairwise FST values were categorically represented into a range of genetic differentiation; 0.0 to 0.05 indicates minor genetic differentiation, 0.05 to 0.15 indicates moderate genetic differentiation, 0.15 to 0.25 indicates high genetic differentiation, and values above 0.25 indicate strong genetic differentiation among the subpopulations (Hartl DL, 1997; Wright, 1978). The estimated level of migration (Nm) was interpreted with the criteria that a Nm > 1.0 indicates sufficient gene flow between subpopulations and Nm > 4.0 is an indicative of panmaxia (high gene flow) (Hartl DL, 1988). Using these guidelines, the pairwise FST and Nm values highlighted four distinct groups of populations in this sample data. The first group can be classified as the Northeast-East-Island (ASM, WBG, ORS, JHK, CHG and CN), second from the Northwest (U.P, RAJ and GUJ), third from the Southwest (GOA and TN) and fourth from the Central (MAH, M.P).

No population differentiation occurred between subpopulations the comprising Northeast-East-Island group (FST ≤ 0.05, P > 0.05), except WBG, which showed moderate levels of differentiation. Additionally, this group had low to high levels of differentiation with other groups (Table 8a). The Southwest group showed high genetic differentiation with Northwest and Central group. A strong genetic differentiation was observed between Northeast-East-Island group and Southwest group. All four groups showed significant genetic differentiation (FST ≥ 0.04, P < 0.005) and high gene flow between them, i.e., the pairwise FST ranges from 0.041 to 0.25 and Nm ranges from 1.47 to 11.54 (Table 8a,b). While analyzing the genetic differentiation and geneflow at pfcrt-flanking loci, the central group appears to be genetically related with all the groups inferring interaction with all group of populations, which can be explained by the geographical location i.e., central part of India and migration patterns of chloroquine resistant parasite across the country over time. Such evidence of strong genetic differentiation between different groups of subpopulation and high gene flow within the group of subpopulation permits the development of a hypothetical route of gene flow in relation to chloroquine resistance among Indian P falciparum (Fig 4).

Next, an AMOVA was used to analyze the neutral and pfcrt-flanking loci amongst the seven geographic locations (Northeastern, Eastern, Central, Northern, Western, Southern and Island). Additionally, an AMOVA was also used to analyze these loci between the two groups according to transmission intensity (HTA and LTA).

AMOVA based on neutral loci showed that variation among the subpopulations constituted 17.51% of the total variation when no grouping of subpopulations was inferred in the AMOVA. While grouping of subpopulations by geographical region, analysis indicated 7.70% variation among populations. Analysis of the two clusters detected with STRUCTURE resulted in 17.26% variation among clusters. Furthermore, the new groups defined by Nm and FST values show 13.17% variation among the four groups and 7.14% of variation among subpopulations within groups. Separate AMOVAs for each new group defined by Nm and FST values was also performed, in which the variation between subpopulations ranged from 4.51% to 13.58% (Table 9a).

AMOVA based on pfcrt-flanking loci, showed that variation among subpopulation constituted 18.48% of the total variation when no grouping of subpopulations was inferred in the AMOVA. When grouping of subpopulations was implemented for seven different geographical regions, there was 13.06% variation among populations. When the same analysis was used on three clusters detected with STRUCTURE, high variation (36.77%) was detected between clusters. Furthermore, while AMOVA was analyzed for four groups defined by interpretation of Nm and FST, it results into 15.32% variation between these four groups and show 6.38% of variation between subpopulations within the group. Separate AMOVA of each above four groups was also performed, in which the variation between subpopulations ranged from 0.30% to 7.57% (Table 9b). AMOVA results showed that almost all the genetic variation was found to be within groups or clusters. The analysis of four groups noticed little genetic differentiation among subpopulation within the same geographic region (FSC =0.08 (<0.0001)). However, substantial differentiation noticed among populations (FCT =0.13 (0.0001)). In fact, differences among populations of different regions is possible reason for all the differences observed among all populations (FST =0.20 (<0.0001)). These results support the division of the populations into four groups, as done by pairwise FST and Nm values.

Isolation by distance (IBD)

In the Mantel test, significant positive correlation between FST / (1- FST) and the natural log of geographic distance was detected across all the populations for both neutral (Mantel test, P = 0.03, r = 0.31, N =169) and pfcrt-flanking loci (P = 0.001, r = 0.39, N = 169) (Fig 5); however, different levels of genetic diversity at HTA and LTA may have skewed this data. To offset this potential bias, the analysis was repeated for each transmission area, separately. The division of populations resulted in non-significant Mantel probabilities and a decrease in the correlation coefficient for pfcrt-flanking loci (in HTA; r = 0.26, P = 0.10 and in LTA; r = 0.28, P = 0.18). In the case of the neutral loci, LTA show a significant correlation coefficient (r = 0.50, P = 0.03, N = 83); however, a decrease in the correlation coefficient (r = 0.15, P = 0.23, N = 86) and a non-significant Mantel probability was observed in HTA.

Phylogenetic relationship among individual multi-locus haplotypes

The median joining network permits visualization of mutation paths undergone between the pfcrt haplotype, which helps to explain the evolutionary relationship between the haplotypes observed in the sample set. When compiled into haplotypes, microsatellite loci around pfcrt haplotype generate two clusters according to the pfcrt haplotypes. However, these two pfcrt haplotypes are not separated by a high frequency of ancestral divisions (nodes). Haplotype ST1 (pfcrt-flanking microsatellite haplotype) was found to be associated with the cluster predominately occupied by the SVMNT haplotype (as indicated by the red box in Fig6). Due to the position on the phylogenetic network and close linkage to many other pfcrt-flanking haplotypes (ST5, 7,10,16 and 18) which are found in all locations, it is likely that these infections (describe by haplotypes) are of recent origin. Comparatively, the cluster associated with the CVIET haplotype shows an equal distribution between microsatellite haplotypes, which are linked with more historical networks, indicating an increased number of mutations. The wild type CVMNK appears more closely related to the CVIET haplotype than that of the SVMNT haplotype. While few SVMNT haplotypes were also closely related with wild type CVMNK those were found in HTA.


Results from this study provide important information pertaining to how chloroquine usage and transmission intensity impacts the population structure of the Indian P. falciparum parasites.

Parasite diversity across Indian P. falciparum populations

P. falciparum populations from high transmission areas (HTA) generally report higher genetic variation than those from low transmission areas (LTA) (Anderson et al., 2000) and maintains multiclonal infections (i.e., multiple genotypes per infected person) in such areas (Paul et al., 1998). Here, genetic diversity within both neutral microsatellite loci and pfcrt-flanking microsatellite loci varies according to transmission intensity. The extensive microsatellite diversity within HTA corroborates earlier studies by (Garg et al., 2007; Joshi, 2003), which demonstrated extensive polymorphism in surface antigens akin to high level of multiple infection in HTA. Higher values for many measures of genetic diversity (AR, PR and He) within HTA indicate high rate of recombination and less inbreeding in parasite population (Conway et al., 1999). Furthermore, a reduction in all measures of genetic variation was observed at pfcrt-flanking loci of resistant pfcrt haplotypes, when compared with the neutral loci, suggesting that there is selection around the pfcrt gene.

Trend of chloroquine selection

There was a significant reduction in genetic diversity (He) within the pfcrt-flanking regions of the mutant pfcrt alleles relative to wild type alleles, which indicates chloroquine selective pressure on the pfcrt gene. Reduced width of selective sweep (pfcrt-flanking region with reduced variation) around resistant CVIET allele in comparison to resistant SVMNT allele is possibly due to predominance of resistant CVIET allele and higher recombination rate within HTA. Approximately two fold higher He values around SVMNT alleles in HTA reiterates higher recombination events in populations at HTA and also a varied strength of selection at different geographic region cannot be ignored. The next possible reason may be varied time points of colonization of different resistant alleles at different regions. The CVIET was possibly introduced earlier than SVMNT in India and may have evolved multiple times due to recombination events adding new alleles to population under HTA epidemiology. When reduction of heterozygosity was assessed at different locations, it provides a selective sweep around -20kb to +6kb in all sites under high transmission, whereas locations under low transmission epidemiology showed wider selective sweep around ±24kb.The exception in LTA was MP, where considerably higher amount of He was observed. The fitness of resistant alleles may also play a role in hitchhiking, as the data set show fixation of resistant SVMNT at LTA in comparison to HTA where resistant alleles had still not reached fixation. This difference in relative fitness of pfcrt alleles at different epidemiology may contribute to the observed differential hitchhiking and to the difference in transit time (spread to fixation) of resistant allele. The data revealed that the molecular basis of resistance differs with change in transmission intensities, which concurs a recent report (Lumb et al., 2012) and also likely to translate the varied CQR level at different locations. In a selective sweep, reduction of heterozygosity was also accompanied with large regions of elevated linkage disequilibrium around resistant gene. The observed LD value supports above narrower selection valley as it got reduced with addition of a distant loci at upstream (+106kb) onwards. The decline in LD is more rapid around CVIET (IAS =0.08) in HTA than around SVMNT (IAS =0.10) in LTA, consistent with weaker selection event at HTA as discussed above. Hence, we clearly visualize a pattern of different genotypes occurred at different regions (broadly HTA and LTA), which in turn raises a possibility of detecting geographical structure in relation to chloroquine resistance among Indian P.falciparum.

Inference of Population structure among Indian P. falciparum

Three neutral loci located on three different chromosomes were used to identify the population structure of Indian P. falciparum. After STRUCTURE analysis, two discrete clusters associated with transmission intensity were identified, where Neutral cluster1 was significantly comprised of infections from HTA and Neutral cluster2 was significantly comprised of infections from LTA. Furthermore, there was minimal admixture between HTA and LTA, which supports earlier reports of the association between the genetic structure of P. falciparum and patterns of transmission (Anderson et al., 2000). Thus, the STRUCTURE results corroborate the differential measurements of genetic diversity at HTA and LTA. Further pairwise FST analysis confirmed the population differentiation provided by STRUCTURE, estimating substantial levels of genetic differentiation between populations from HTA and LTA (Table 6). This genetic differentiation could be due to genetic drift; however, the observed value of Nm ≥ 1 between all of the subpopulations does not support genetic drift (Slatkin, 1987).

Low population differentiation was detected between falciparum subpopulations within HTA except WBG, which has moderate differentiation when compared with other regions. Corroborating earlier reports of genetically similar surface antigens (MSP1 and MSP2 ) between Assam and Eastern India (Joshi et al., 2007), indicate a pattern of current or historical migration due to their geographic proximity. CN and Eastern India were also found to be genetically similar, indicating that there is considerable parasite gene flow between mainland and island parasite populations, which supports previously reported data based on surface antigen (AMA1) (Garg et al., 2007). In LTA, MP show low genetic differentiation with RAJ, GOA and TN (FST = 0.05, P > 0.094). (RAJ-GUJ-UP) and (GOA-TN) showed high genetic differentiation between them. Thus, Southwest (GOA=TN) part of India have most genetically isolated population from rest of India except with MP. The three-level hierarchical analysis of genetic differentiation (AMOVA) supported the division of Indian P. falciparum population to four geographical groups, as maximum variations observed were within the subpopulation rather than among subpopulations of a particular geographical group. A significant positive correlation between genetic variation and geographic distance was obtained by Mantel test on all the populations. This indicates a genetic isolation by geographic distance in falciparum population of India.

Geographical distance, alone, does not adequately explain the divergent parasite populations across India, especially given populations like ASM-MAH, MAH-CN, and ORS-CN are greater than 1200 km apart and showed no genetic differentiation. However, the extend analysis at LTA and HTA revealed a significant isolation by distance (IBD) at LTA only, and no such correlation between genetic distances and geographic distances in HTA. This again supports our above division of Indian P. falciparum population to four geographical groups. Further mantel test of all populations excluding the Southwest group revealed no correlation between genetic distances and geographic distances in both HTA and LTA. Probably the genetic differentiation of Southwest region with all other group is core responsible for IBD observation in whole dataset and even in LTA alone.

Spread of hitchhiking in Indian P. falciparum population

Collectively, the results (STRUCTURE, FST, AMOVA and IBD) indicate that the subpopulations have become differentiated from one another. This deviation from single random mating population may effect the genetic hitchhiking around a beneficial mutations accrued in population. We observed interesting pattern while comparing pairwise FST between neutral loci and pfcrt-flanking loci (table8) at the four patches of population inferred above; Southwest and Northeast-East-Island show highly differentiated populations at both neutral and pfcrt-flanking loci indicating hetrogenous population and low gene flow; Central and Northeast-East-Island or Northwest show low genetic differentiation at both neutral and pfcrt-flanking loci indicating homogenous population and continuous gene flow; Central and Southwest show moderate differentiation at both neutral and pfcrt-flanking loci indicating a limited gene flow between populations; Southwest and Northwest show high genetic differentiation at neutral loci but low genetic differentiation at pfcrt-flanking loci indicating that evolutionary process (i.e., strength of selection) is faster than rate of migration. This could be due to population subdivision which causes several important modifications in strength and pattern of a selective sweep. Studies (Bierne, 2010; Slatkin. M, Wiehe. T, 1998) showed that in a subdivided population, genetic hitchhiking can introduce population differentiation (large FST) in an initially homogeneous population (small FST) and the similar pattern is displayed between Northwest and Northeast-East-Island group. The gradient of heterozygosity may be used along the pathway of a sweep to infer the geographical movement of the beneficial mutation as it leaves weaker signature of selection while spreading across distant subpopulations after its introduction in first population i.e., heterozygosity is lowest in subpopulation where beneficial mutation is introduced and subsequently increased with the spread across neighboring demes (Kim, Maruki, 2011). Eventually, our data also show a gradient of heterozygosity in pfcrt-flanking loci from Southwest to Northeast-East-Island part of India and this raises thought about probable introduction of SVMNT allele in southern India and then progress to other parts of India.

Dispersal/Probable migratory route of chloroquine resistant haplotype

The STRUCTURE analysis of pfcrt-flanking loci showed 3 discrete clusters associated with the different pfcrt haplotypes (CVIET, SVMNT, and CVMNK). This in turn reflects the observation of varied level of selective sweep associated to different pfcrt haplotypes could be due to different chloroquine pressure at various regions. The pairwise FST estimate show high genetic differentiation between East-Island (ORS, JHK, CHG, CN), with that of Northwest (UP, RAJ, GUJ) and strong differentiation between East-Island and Southwest (GOA&TN), indicating a limited migration of resistant alleles between East-Island with that of Northwest and Southwest. In turn, the Northeast region may have indeed, been a major migratory route of the resistant alleles (CVIET& SVMNT) likely originating from the Southeast Asia region and then spreading into Eastern India and consequently to other parts of India. While Central (MP, MAH) group show moderate genetic differentiation at pfcrt-flanking loci with Northeast-East-Island, Northwest and southwest.

Admittedly, more neutral loci may be needed to make concrete conclusions about geographic structure in Indian P.falciparum, but even with these three neutral loci, there is evidence of a genetic structure that is strongly linked with the patterns of malaria transmission. Additionally, resistant alleles are also differentially structured; probably due to above found geographic structure or varied chloroquine usage in these locations. While determining F-statistics estimations, a small amount of genetic exchange between populations is enough to prevent the accumulation of large genetic differences between them. Thus, the significant strong genetic differentiation found between Southwest and Northeast-East-Island part of India implies limitations in direct dispersal of resistant alleles and supports the movement of resistant alleles via Central India.

Human migration between malaria endemic regions plays an important role in the movement of resistant alleles in distant populations (Hume et al., 2003; Lqbal et al., 2002). For example, two recent studies based on pfcrt-flanking microsatellite markers (Mixson-Hayden et al., 2010; Rawasia et al., 2012) reported that the resistant SVMNT found in India and Pakistan migrated from PNG. Evidence for this was also found by clusters of neutral microsatellite markers that were earlier reported between PNG (Pacific region) and Thailand (Southeast Asia) (Anderson et al., 2000). In this study, the microsatellite profile of pfcrt-flanking loci associated with the resistant allele SVMNT and CVIET were similar to that of microsatellite profile reported from PNG and SEA, respectively (Mixson-Hayden et al., 2010; Wootton et al., 2002). A recent study (Awasthi et al., 2011) examining the allele frequency of pfcrt haplotypes (CVMNK, CVIET and SVMNT) postulates a probable route of migration for different pfcrt haplotypes in India and reported that the mutant SVMNT allele arrived India from PNG via SEA. In presence of population differentiation and gene flow data inferring a geographic structure in Indian P.falciparum, we postulate another probable route for dispersal of mutant SVMNT across India through Sri Lanka or southern India (fig 4).

Human migration related to labour and tourisim had been associated with prevalence of chloroquine resistance in India, particularly labourers travelling from eastern part of India to the western states (Sethi et al., 1990; Sharma, Sharma, 1988; Sharma, 2000). The study site in TN (Rameswarum Island) has been associated with continuous human migration with Sri Lanka and being a holy place the study site also receives a large number of pilgrims from all over India. In addition a large number of tourists stay here before or after visiting Sri Lanka. These human migrations within this region have likely helped to facilitate the spread of chloroquine resistant parasites between both country and throughout India (Rajagopalan et al., 1986). This, in turn, may have given chance for immigration of SVMNT from Sri Lanka to India as a recent study reported ubiquitous appearance of SVMNT in Sri Lanka (Zhang et al., 2011) and we also observed similar frequency of SVMNT from this study site also. The predominance of similar mosquito vector (Anopheles culicifacies; sibling B,E) in both locations (Surendran et al., 2000) provide an added foundation for this hypothesis.

It is well known about human migration between southern India, Sri Lanka and Indonesia for trading and the cultural contacts was much greater than hitherto been imagined (Karafet et al., 2005). Similarly, Pakistan shares international borders with Rajasthan and Gujarat, which reports the fixation of SVMNT allele with similar pfcrt-flanking loci in both countries. Where as the countries sharing border with northeast India like Nepal, Bangladesh and Myanmar reports higher frequency of CVIET (Banjara et al., 2011; Kawai et al., 2011; Mohapatra et al., 2005). It is worth putting the South Asia-Southeast Asia migration corridor in perspective. Thus, we here, infer immigration of SVMNT allele from PNG to India via Sri Lanka or south India. At last the results of median joining NETWORK construction reveal two distinct clusters of resistant alleles CVIET and SVMNT, indicating accumulation of resistant alleles on a distinct genetic background. However, clustering of multilocus haplotype in SVMNT supports a recent origin than that of CVIET cluster which seems to have undergone a large number of mutational steps over time. The maintenance of diversity over time in HTA suggests a large effective population size. Thus, we observed a strong geographic structure governed by transmission intensity. It is possible that multiple selective sweeps have occurred independently in different geographic locations due to local adaptation.


The molecular analyses presented here suggest strong genetic subdivision of P. falciparum between high and low transmission regions of India, possibly related to contrasting chloroquine utilization in each epidemiological type. Within each regional group, low levels of parasite genetic differentiation were observed across India. The clarity of the similarities in population structure observed by the two groups of markers (neutral and pfcrt-flanking) may be reflecting the fact that the drug resistance does maintain the circulation in population structure. Eventually we elucidate that Indian P. falciparum population at least consists of four patches of geographical subpopulations circulating with each other.


PKM was supported by grant 5D43TW007884-05 "Promotion of Plasmodium Research and Training in India" from the Fogarty International Center. The study was supported by NIMR (ICMR), New Delhi and partly by the same above grant "D43TW007884". PLS was supported by a Fogarty International Center/NIH U.S. Global Health Postdoctoral Scientist Fellowship 3D43TW007884-03S1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Fogarty International Center or the National Institutes of Health or NIMR. We sincerely thank all the staff members of NIMR, Delhi and the respective field units (LIST OUT THE FIELD UNITS AND THE OICS), for their help, cooperation and support during the study.