This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
One third of the global population is infected with Mycobacterium tuberculosis and only a small proportion develops tuberculosis. Global estimates of mortality due to TB ranges from 1.1 to 1.7 million deaths among HIV-negative people and an additional 0.45-0.62 million among HIV-positive people (61,62). Other diseases like HIV/AIDS (25) and diabetes mellitus (34) have a negative effect on people with latent TB infection and can result in an increased risk of developing active TB from 1 in 10 to 1 in 3 (52,61,62).
The advent of molecular typing techniques has improved our understanding of the pathogen that causes TB disease, Mycobacterium tuberculosis. This review looks at the genetics, transmission and geographic distribution characteristics of the ancestral Mycobacterium tuberculosis lineages mainly responsible for the cause of tuberculosis in the Indian sub-continent and the Far East. This review focuses on the ancestral strains on account of their huge contribution to the global tuberculosis burden and the relative dearth of information on these strains. Here, more emphasis was placed on strain families other than the Beijing family which has already been described in detail in other published reports and reviews (38).
The members of the Mycobacterium tuberculosis Complex (MtbC) are closely related to each other and consist of the species Mycobacterium tuberculosis (Mtb), Mycobacterium africanum, Mycobacterium bovis, Mycobacterium canettii, Mycobacterium microti, Mycobacterium pinnipedii, dassie bacillus, oryx bacillus and Mycobacterium caprae. The MtbC members have evolved from the same common progenitor (28,30,63) and share 99% similarity in their genome. This is a result of the bug evolving and expanding clonally after undergoing an evolutionary bottleneck some 15,000-20,000 years ago which led to speciation (50). Recent published works have now estimated that speciation occurred between 20,000-35,000 years ago from a common ancestor termed M. prototuberculosis (27,63). Following speciation from M. prototuberculosis, different strains of Mtb evolved and are evident from the different genotyping methods and cluster analyses done on Mtb (7,19,21).
GENETIC MARKERS AND DESCRIPTION OF LINEAGES
A number of genotyping methods have been used in epidemiological, phylogenetic and evolutionary studies to study strain variation and distribution. These methods are based on genomic changes which can be either single base changes involving synonymous or non synonymous nucleotide polymorphisms (SNP) or large sequence changes involving insertions, duplications and deletions (indels). Synonymous SNPs (sSNPs) does not result in amino acid changes whilst amino acid changes occur in the non synonymous SNPs (nsSNPs) (19,50). Each method based on these genomic changes has its own merits and disadvantages and a combination of methods are often used in studies in order to increase the amount of information gained (28,39,41,49). The genotyping methods which are commonly used in TB research include IS6110 insertion element based Restriction Fragment Length Polymorphism (RFLP), SNP Analysis, Large Sequence Polymorphism Analysis (LSPs), Spoligotyping and Mycobacterial Interspersed Repetitive Units-Variable Number Tandem Repeat (MIRU-VNTR) analysis (19,21,35,50,59).
RFLP is still considered the gold standard for genotyping Mtb and it is based on the detection of the location and number of insertion sequence (IS) IS6110 present in the genome (41,59). The marker has a relative short molecular clock and can differentiate isolated strains in epidemiological studies. This technique however does not have good resolution for isolates that have less than 5 copies of the IS element in their genomes (28,48).
Fig 1: Example of representative IS6110-based RFLP image. Isolates represented by lanes 3, 5, 6, 9, and 10 have the same pattern and were epidemiologically linked. Lane S shows the CDC molecular weight standard (http://www.cdc.gov/tb/programs/genotyping/images/IS6110.gif)
Spoligotyping is a PCR based technique based on the analysis of the Direct Repeat (DR) locus for the presence or absence of conserved spacer sequences amongst repeat sequences (35). Strain-specific deletions of selected spacers at specific positions within the DR locus results in an unambiguous 43 digit spoligotype pattern for each strain based on the presence or absence of a spacer. The spacers are numbered 1-43 and each spacer has a unique associated number within this range. Spoligotype strain family names are based on these patterns and a database of these patterns has been created and named the SpolDB4/SITVIT (SITVIT is update of SpolDB4 (3)) database (9,33). The occurrence of two or more strains in the database having the same spoligotype pattern constitute a "shared type" (ST) with numbers designated to each pattern (9). A group of STs with similar patterns constitute a clade or strain family. Spoligotyping is particularly useful in identifying strain families that are present in a particular geographical setting and has been utilised in multiple studies as is evidenced in the SpolDB4/SITVIT database (17,33,36). A drawback of this technique is that it can exhibit convergent evolution as a result of the same spacers being deleted at different times in different strains which can result in two different strains showing similar patterns (19,21).
Fig 2: Example of spoligotype pattern of clustered M. tuberculosis strains. aAs identified in SpolDB4.0; SIT, spoligotype international type; Nb, number of isolates (as a percentage of total M. tuberculosis strains in example); cfilled boxes represent positive hybridization while empty boxes represent absence of spacers; dlabel defining the lineage/sub lineage; ND, not yet determined in SpolDB4.0. (http://www.biomedcentral.com/1471-2334/8/101/figure/F1?highres=y)
The Mycobacterial Interspersed Repetitive Units/ Variable Number Tandem Repeats (MIRU-VNTR) technique is based on the variation of micro-satellite repeat sequences at different loci within the bacterial genome (22,42,56). These are scattered throughout the whole genome of M. tuberculosis and different systems have been developed which utilise different sets and numbers of loci (42,55). Combinations on the number of repeats at each locus are descriptive of a strain.
Large variable regions due to insertions, duplications and deletions give rise to regions of difference (RD) among M. tuberculosis strains families (6). LSP analyses make use of specific DNA sequences that have been deleted over time in specific strains or clades and the ones that coincide with unique events are particularly useful in phylogeny and evolutionary studies (21). For example, one particular deletion termed "M. tuberculosis specific deletion 1 (TbD1)" is used to classify strains into modern (TbD1-â€‘) or ancestral (TbD1+) types based on whether the deletion is present or absent (6).
sSNPs are thought to represent changes that have not resulted from selective pressure and can thus be used as markers to show evolutionary relationships among different strains (26). From an evolutionary point of view, M. tuberculosis strains can be grouped into 3 Principle Genetic Groups (PGGs) based on a combination of polymorphisms in the gyrA and katG genes (50). Group 1 (PGG1) has the combination katG codon 463 CTG (Leu) and gyrA codon 95 ACC (Thr); group 2 (PGG2) katG 463 CGG (Arg) and gyrA codon 95 ACC (Thr) and group 3 (PGG3) katG 463 CGG (Arg) and gyrA codon 95 AGC (Ser) (50). PGG1 is the most ancestral followed by PGG2 with PGG3 being the most recently evolved.
M. tuberculosis strains have been differentiated using all of these techniques and phylogenies based on SNPs, Regions of Difference (RD)s, RFLP, MIRU types and Spoligotypes have been constructed (5,9,19,23).
ORIGIN, DISSEMINATION AND ASSOCIATION WITH HOST OF TUBERCULOSIS
It is thought that through the various stages of evolution driven by stressful environments and genetic drifts among other things, the tubercle bacilli became an obligate parasite of the early hominids of the Horn of Africa (27,31,63). These early hominids lived in small group numbers and survived by hunting and gathering. The bacterium had a limited host range highlighted by the scattered small hominid groups (29).
Considering that there is 99.9% similarity in the genome of MTBC members with the exception of M. canettii, the members must have come about as a result of clonal expansion from a successful progenitor (15,29). This subsequently led to speciation and the distinct host ranges that the members of the MTBC have (6,50). It has been hypothesised that there was an out of Africa migration by the early hominids into Asia followed by a back migration back into Africa and the rest of the world (31,63). As man dispersed globally and evolved as a result of the different environments he encountered (12), the obligate tubercle bacilli co-evolved with him (27,31). The migration of humans into the southern parts of the Indian sub-continent spread with time to the northern parts of India, Europe and East Asia, establishing what we now know as the modern lineages of Mtb (31). The advent of farming resulted in population density increments compared to the small hunter-gatherer groups of the early hominids (63). This resulted in an increase of infection by the bacterium coupled to the associated spreading out of people in densely populated area (31)
The continued relationship of the tubercle bacilli with its human host resulted in a stable association between host and organism. This is evident in the geographical distribution of the bacterium which correlates with earlier human migrations, or origin of birth, both in the near and distant past (2,23). Studies have revealed that certain strains of M. tuberculosis are more adapted to cause disease in people of particular origin of birth. An example of this was the association that was found in the human host for a polymorphism in the TLR2 gene with disease caused by the Beijing strain (13)
M. tuberculosis 5
M. tuberculosis 1: Indo-Oceanic
M. tuberculosis 2: East Asian
M. tuberculosis 3: East-African Indian
M. tuberculosis 4: Euro-American
M. tuberculosis 5:MANU
Fig 3: Distribution of M. tuberculosis strains adapted from Gagneux et al 2006
PRINCIPLE GENETIC GROUP 1 MEMBERS
As mentioned earlier, PGG1 group members are the most ancestral of the 3 PGGs and are composed of 4 strain families following the spoligotyping classification system. These families are MANU, East African India, (EAI), Central Asian (CAS) and W-Beijing. The clades are predominant in specific geographical areas and in particular in the Indian sub-continent and East Asia (10,47,48,53). No author has ever attempted to review all literature on the ancestral Mtb strains as a group. Furthermore, with the exception of the Beijing family, there is also a dearth of information on individual strain families in PGG1. This is especially evident for the MANU family. Considering that the Indian sub-Continent and the Far East contribute more than a 3rd of the global TB burden and that the PGG1 strains are predominant in these regions (61,62), an understanding of these strain families is an important step in the global control of the disease (49).
Fig 4: Estimated epidemiological burden of TB, 2008 (Adapted from WHO. Global tuberculosis control: a short update to the 2009 report). AFR=Africa Region; AMR= American Region; EMR=Euro-Mediterranean Region; SEAR= South East Asia Region; WPR= Western Pacific Region
MANU (Part of the Indo-Oceanic lineage according to Gagneux et. al 2006)
The MANU lineage was initially described in India from a study that incorporated spoligotyping (47). The name is derived from the name of a Hindu mythological figure supposed to be the world's first king and father of the human race (47). It is described as an ancestral strain on account of being TbD1+ (20) as well as having most of the spacers of the DR locus present (i.e. a complete or near complete spoligotype). MANU has been divided into 3 sub-lineages according to the SpolDB4 database (see fig. 2). MANU 1 has single spacer missing at position 34 whilst MANU 2 has spacers deleted at positions 33 and 34 with MANU 3 having 3 spacers deleted at positions 34, 35 and 36 (9).
Family Shared Type No. Spoligotype Pattern
MANU 1 100 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
MANU 2 54 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
MANU 3 1378 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§
Fig 5: MANU Spoligotypes (9)
The strain family can also be considered to be part of the Indo-Oceanic lineage the RD239 deletion being present in all MANU strains (23). It is also likely that it would have 2 alleles at locus 24 in MIRU-VNTR analysis as it has been shown that isolates harbouring TbD1+ have 2 alleles at this locus (53). It should, however, be noted that the geographical distribution of MANU does not follow the typical distribution of the Indo-Oceanic lineage for which the name "Indo-Oceanic" is associated (23).
In a recent study in Egypt, a single isolate of a MANU ancestor harbouring all 43 spacers was isolated in addition to a variant of the same (30). The absence of deletion analysis in the Egyptian study can however raise doubt as to whether the strain identified as the MANU ancestor was indeed such. This is because homoplasy can exist for spoligotypes with a full set or few deletions of spacers (21). Beijing strains with a full set of spacers have been observed and these were classified as Beijing based on the deletion of RD105 (a marker for all Beijing strains) and being TbD1-. In the absence of such deletion analysis and basing classification only on spoligotyping patterns, there is a chance that the identified MANU ancestor could be a Beijing strain ancestor.
IS6110 Copy Numbers
To our knowledge, the number of IS6110 elements in the MANU family has not been described. This needs to be done as it might provide greater insight into whether IS6110 can better discriminate isolates in epidemiological studies.
MANU strains have been reported in different countries but it is only in Egypt and the city of Delhi where it was found to be the predominant strain (16,30,45,47). The countries in which it has been found include India, Saudi Arabia, Madagascar, Tanzania, Tunisia, and the USA (3,16,18). A few isolates have also been isolated from South Africa and these have mainly been MANU 2 strains (33). Spain has also contributed the highest number of MANU 3 isolates in the SITVIT database (33). The proportion of this to all MANU records in the database is however small.
Host Pathogen Association and Transmission
As mentioned above, there are very few geographic locations where MANU is the predominant strain. It is only in Egypt where this has been observed and only here can an association between the host and pathogen be observed (30). It is possible that the strain has been in Egypt for a long time. It is known that Mtb strains are found in mummies and termed M. africanum- like in the past (30). It would be interesting to do further analyses on the strains to see how close they are to the MANU strain family.
Studies have shown that there can be strain association with age in an area as was shown in India for EAI and CAS lineages (48) and it can be inferred from these studies whether a strain is emerging in an area by transmission through affecting the younger population, or predominantly reactivating in the older population
Transmission of the strain family was established in the Egyptian study and was evident from the clustering and high proportion of MANU observed (30). No data analysis was however done on how these strains segregated according to age cohorts. This would have given an idea on whether the predominance of MANU was due to reactivation in the older population or active transmission in the younger, or possibly a combination of both. A table showing distribution amongst age group was shown but this was a composite of all isolates in the area which makes it difficult to ascertain the transmission of a single family within the studied community. A similar reporting of results was done in India with respect to age where MANU was found to be a predominant strain in the city of Delhi (47).
The MANU 2 sub-lineage was observed to be the most prevalent of this family (ST54) as has been reported in different studies (3,30,47) and also from the entries in the SITVIT database (33). A number of variants of MANU 2 were also observed in Egypt and the variant types were more in this sub-lineage than for either MANU 1 or 3. These were not submitted to the SITVIT database at the time that the study in Egypt was published (30)
Fig 6: MANU 2 Variant Spoligotypes in Egypt (30)
In the absence of a second genotyping method besides spoligotyping to describe MANU clustering, it is difficult to ascertain whether there is transmission of a particular strain. This is due to the fact that spoligotyping can overestimate clustering of some lineages as is the case for Beijing (21,60). More research in this regard needs to be done on MANU.
East African Indian (EAI) (Part of the Indo-Oceanic lineage according to Gagneux et. al)
The East African Indian (EAI) clade is divided into 9 sub-lineages having common deletions at spacers 29-32 and spacer 34 (9). EAI_5 is the most ancestral of the spoligotypes of the clade with only the above mentioned spacers deleted.
Family Shared Type No. Spoligotype Pattern
EAI-5 236 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
EAI 1-SOM 48 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï£ï§ï§ï§
EAI2-Manilla 19 ï§ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
EAI2-Nonthaburi 89 ï§ï§ï£ï§ï§ï§ï§ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
EAI3-IND 11 ï§ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï£ï£ï£ï§ï§ï§ï§
EAI4-VNM 139 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
EAI6-BGD1 591 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
EAI6-BGD1 1898 ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï£ï§ï§ï§ï§ï§ï§
EAI8-MDG 109 ï§ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
Fig 7: EAI Spoligotypes (9)
The other sub-classes exhibit further deletions from the EAI_5 type in different numbers and positions. RD analysis shows that the lineage also have RD239 (shared with MANU) deleted and also falls into the Indo-Oceanic lineage under the LSP classification system of Gagneux et al. (23). Additionally, the lineage is TbD1+ and has 2 alleles at locus 24 in MIRU-VNTR analysis (53). The lineage can thus be considered to be of an ancestral nature (6). The group also belongs to PGG1 based on SNPs at the katG and gyrA genes (50). Holistically taken, the lineage is part of an ancestral sub-set of PGG1closely related with MANU.
IS6110 Copy Numbers
The EAI strain family has been characterised by low copy IS6110 numbers in India (14). This scenario is seen in a number of other countries including Singapore and Madagascar, (17,53). It was observed in India that whereas RFLP based on IS6110 had shortfalls in discriminating between isolates with low copy number of IS6110, spoligotyping was highly efficient (41). MIRU genotyping is also highly efficient at discriminating low copy IS 6110 strains as demonstrated in a study in Singapore (53). In the south of India, where up to 80% of TB cases were due to this strain family having low copies of IS6110, the ST of the family was ST11. On the other hand, ST89 and ST292 in Yangon, Myanmar were observed to have IS6110 numbers â‰¥7 (43). The situation in Madagascar where ST109 was found, the ST had low copies of IS6110 (17). Taken together, the EAI strain family was generally observed to have low copies of IS6110 and that high copy numbers are the exception. This was also only observed for ST89 and ST292.
Geographical Spread and Transmission
The Indo-Oceanic lineage is predominant in South East Asia, East Africa and some parts of Europe and Oceania (9,45) ( see fig. 1). The sub-lineages have got different frequencies in different geographical locations. The highest numbers of EAI3_IND (ST11) for example has been reported in the Indian sub-continent and South East Asia. In India, this is found in the southern parts of the country and is responsible for the majority of TB cases in this region (45,47). Some areas on the other hand can have a mixture of sub-lineages predominating with each sub-lineage being composed of different STs. This scenario is evident in Bangladesh (44), Myanmar (43) and Madagascar (17).
Host Pathogen Association and Transmission
India provides the classic case of host-pathogen association of the EAI lineage. It has been observed that the strain is prevalent amongst people found in the south of India as opposed to the north. This has been deduced to be a result of the people in the north having a different origin and introduction to the bug when compared to the people in the south (4,46,48). EAI also demonstrated an association with people in the northwest of Madagascar who are believed to have originated in South East Asia where the strain is prevalent (18). Additionally, the strain family is associated with people who are older than 46 years old and in those people who had never taken TB treatment before (4,47,48). As mentioned earlier, different strain types are predominant in different geographical areas and in the south of India, EAI3_IND (ST11) was the main causative agent of disease. Contrasting the Indian situation where the older people of society were associated with disease, Myanmar had most people older than 45 years having active disease (43). It should be borne in mind though that this statistic was given for the combined transmission of Beijing and EAI. Additionally, the EAI STs transmitting were different from the ones found in South India.
Central Asian (CAS) (East African Indian lineage according to Gagneaux et. al. 2006)
The Central Asian (CAS) lineage is also known as the East African Indian lineage under the naming system by Gagneux et al based on LSP (24). Under the spoligotyping description, the clade has spacers 4-7 and 23-34 deleted (9,47). (See Fig. 5)
Four sub-lineages are described in the SpolDB4 database with deletions as indicated in the Figure 5.
Strain Shared Type No. Spoligotype Pattern
CAS1-Delhi 26 ï§ï§ï§ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
CAS1-Kili 21 ï§ï§ï§ï£ï£ï£ï£ï§ï§ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§
CAS1-variant 25 ï§ï§ï§ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï£ï£ï§ï§ï§ï§ï§
CAS2 288 ï§ï§ï§ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï§ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
Fig 8: CAS Spoligotypes (9)
The CAS clade is TbD1- (6) and has a characteristic RD750 deletion (23) . An sSNP mutation in the rpoB gene Tâ†’G at position 2646 is specific for CAS (5). Additionally, a study in India found that a silent mutation in codon 65 of pncA Câ†’T mutation is lineage specific for all CAS strains. However, this was not observed for 2 isolates of ST26 out of a total of 16 isolates studied (51) indicating that their might be some CAS strains that do not harbour this mutation.
IS6110 Copy Numbers
CAS members generally have greater than 5 IS6110 copies as is evidenced from the major areas where the lineage is found (4,45,47,51,57). This would thus make IS6110 an appropriate typing method for distinguishing strains for epidemiological studies.
Information from the SITVIT database (3) show that most isolates of CAS come from the Indian sub-continent. Studies have also indicated that the CAS lineage is predominant in India, Pakistan, and Central Asia (9,45,51). The sub-lineage found in the Indian sub-continent is typically the CAS1_Delhi (ST26) strain having prominence in Northern India and Pakistan (45,51,57). CAS1_Kili (ST21) has got prominence in Tanzania (36), whilst CAS2 (ST288) has got prominence in Madagascar (18). Other geographical areas with a large proportion of the strain include the Middle-East (3). Furthermore, areas that have had large populations originating from the Indian sub-continent, Middle East and Central Asia are also seen to harbour a significant proportion of their TB burden attributed to this lineage (8,9,45,47) These include areas where the movement of people occurred in the distant past as well as the near past in areas such as Africa, Europe and the United States (3,47,48)
Host Pathogen Association and Transmission
The classic case of host-pathogen association for the CAS lineage has best been presented in India where the North and South have differences in the origin of their human populations (4) . People in the south of the country have been less associated with the CAS lineage whereas the ones in the North have been positively associated (4,45,47). The prominence of CAS in Madagascar and Saudi Arabia can also be attributed to the movement of people from the sub-continent and central Asia in the distant and relatively near past (3,17). The situation is also seen in the western world where the lineage is associated with people of Asian origin even when they were born in the USA (2). Taken together it is clear that the human and pathogen have co-evolved and established a transmission relationship as reported in different studies by others (32). Studies from India had indicated that the CAS strain family was mostly transmitting in the age group below the age of 45 years and this was in contrast to EAI which affected the older population (47,48). ST26 was the predominant shared type responsible for CAS transmission in the Indian sub-continent (47,48) and in general, particular shared types greatly contributed to CAS transmission where CAS was a major strain. In Tanzania the major strain type was CAS1_Kili (ST21).
Beijing (East Asian lineage according to Gagneaux et. al. 2006)
The classic spoligotype pattern of the Beijing lineage is characterised by the deletion of spacers 1-34 (10) (see Fig.6). All strains in this lineage has the deletion RD105 and most often RD207 (which leads to the classic spoligotype pattern through the loss of spacers 1-34) (21). The few srains that were found to have an RD105 but not the deletion RD207 are thought to be ancestral forms of the strain without the classical deletion of spacers 1-34 in the DR region (21). The ancestral forms include strains with full complements of all 43 spacers as well as those with only 1 spacer missing(21) (see Fig.7).
Strain Shared Type No. Spoligotype Pattern
Beijing 1 ï€ ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï£ï§ï§ï§ï§ï§ï§ï§ï§ï§
Fig 9: Classical Beijing Spoligotypes (9)
Strain Spoligotype Pattern
Fig 10: Ancient Beijing Spoligotypes (21)
IS6110 Copy Numbers
The Beijing family are characterised by copy numbers of the IS6110 (37,38). High discrimination of strains is realised though the use of RFLP and this is of value in epidemiology studies (54)
The lineage is ubiquitous in distribution with particular predominance in China, Central Asia, Far East and Eurasia (9,40,43,44). The worldwide predominance of the strain varies by geographical region with East Asia having the highest prevalence (1) (See Fig. 11). The family is emerging and becoming predominant in different countries as highlighted in a study of a South African community where the family had shown exponential growth (58) . This was in contrast to other families that formed part of the strain family landscape in the target area and which showed decreased or slightly increased numbers.
Fig 11: Global Spread of Beijing with colour intensities depicting intensity of spread (Adapted from (9)
Host Pathogen Association and Transmission
The Beijing lineage is thought to have originated in China and is associated with people of Chinese origin both in the near and distant past (5,19,43,53). This comprises both populations that originated directly from China and from secondary countries that had populations of Chinese immigrants (40). The frequency in Singapore of people from Chinese origin being infected with Mtb is an example of the association of Beijing with Chinese host type (53). It has also been demonstrated to be associated with people of a young age and can thus be envisaged to be an emerging strain (17,43,47,48). Additionally, the strain is associated with hyper virulence, drug resistance and the ability of drug resistant strains to transmit without the loss of fitness due to the acquisition of drug resistance (11,51).
Mycobacterium tuberculosis, the causative agent of tuberculosis in man, is part of the MTBC whose members are 99.9% similar in their genome with the exception of M. canettii. This is the result of clonal expansion following an evolutionary bottleneck some 20,000 -35,000 with subsequent speciation. In spite of this, Mtb is diverse enough in its genome so that different strain families are found exhibiting specific characteristics when analysed with different genotyping tools.
Furthermore, Mtb strain families co-evolved with their human host in different environments and geographical locations. As different human populations evolved differently in diverse geographical locations, Mtb strain families had formed stable associations with them that resulted in different expressions of disease both at transmission level and virulence. This has been demonstrated in separate geographical locations and implications exist for this in the global control of tuberculosis.
The Indian sub-continent and the Far East contribute extensively to the global tuberculosis burden and the strains that are predominant in the area are the evolutionary older PGG1 members. However, not a lot of information exists about these strains and no one has attempted to provide an overview of these ancestral strain families. Of significance in this area, was the observation that specific Shared Types of strains are predominant in transmission and that there was an association between these and host populations which encompassed origin of host. Association also existed at age cohort level and was exhibited differently for different strain types. EAI for example, was associated with the older generation and was more associated with the southern parts of India. This was in contrast to CAS which was found to be more associated with younger cohorts and was thus seen as an emerging strain in India.
The discriminatory power of the various genotyping methods was different for some STs within strain families. EAI ST11 for example had low IS6110 copy numbers which makes RFLP a less appropriate genotyping tool in areas where ST11 is predominant. On the other hand, EAI ST89 and 292 had high IS6110 copy numbers making this technique relevant in areas where these types are predominant. It was noted that strain families harbour specific deletions or SNPs that differentiate them from each other. These could be used fruitfully in the future to differentiate strain families (See table 1).
Table 1: Selected Genotyping Markers for PGG1 Members
MIRU 24 (2 copies)
SNP LINEAGE NAME(5)
LSP LINEAGE NAME(23)
+ ve (Assumed)
Looking holistically at global TB control, the Indian sub-continent and the Far East make a huge contribution to the global TB burden through predominantly ancestral strains. The strains associate with different host populations based on origin, and age in some cases. However, Beijing is spread at an alarming rate globally and this is a concern because it signifies that the strain has adapted to be better at causing disease. Considering that the world is becoming more global and taking into account the above statement, it is imperative that appropriate genotyping tools be used in specific geographical areas to aid in decision making on the transmission and control of strains. This is of use in determining whether progression in an individual is due to reactivation or recently acquired in the individual as well as whether an epidemic is spreading.
Fast and effective genotyping methods are necessary in the developing world where the threat of TB is greatest in conjunction with HIV/AIDS. Cost and reliability of methods would be important factors in this case as well as the transfer of knowledge of these methods to these areas. Furthermore, particular predominant strains in these areas should also be considered when developing new vaccines. This is particularly important in the case of PGG1 in the Indian sub-continent and the Far East bearing in mind the contribution that these strains make to the global TB burden. It is interesting to note that most research (diagnostic and vaccine) is done with the modern (Euro-American) strains and that the ancestral strains are totally neglected (apart from Beijing).
Much more genotyping and other research work needs to be done on the potential evolutionary position that the MANU strain may have in the MtbC phylogenetic tree. One might ask whether MANU gave rise to certain strain families and what characteristics were lost or gained in this evolution. This could contribute to the overall understanding of the evolution of the bacillus as well as its biology. More work also needs to be done on the EAI and CAS strains given their predominance in the global TB burden.
In conclusion, studies have demonstrated that although MtbC is 99.9% similar in its genome, diversity exists among the strain families which have implications in transmission and association with the human host. Appropriate markers or genotyping methods are important in the understanding and control of tuberculosis and where these may be deficient, appropriate tools need to be developed. In addition, host-pathogen phylo-geographic association needs to be considered when developing vaccines and high burden TB countries need to be considered in this regard. In the Indian sub-continent and Far-East, PGG1 members are the predominant strains and this should be considered when making important global decisions.