This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The evolution of microbial genomes is greatly influenced by horizontal gene transfer, where large blocks of horizontally acquired foreign sequences often encoding virulence determinants, occur in chromosomes of pathogenic bacteria. A program DESIGN-ISLAND developed in our laboratory was used on four completely sequenced Vibrio cholerae genomes, V. cholerae classical O395, V. cholerae ElTor N16961, V. cholerae MJ1236 (a toxigenic O1 El Tor Inaba strain from Matlab, Bangladesh, 1994 that represents the "Matlab variant" of El Tor) and V. cholerae M66-2 (a pre-7th pandemic isolate) in order to identify the horizontally acquired regions. The results marked out the regions having the potential of harboring probable new virulence factors. In addition, comparative genome analysis revealed distinct regions unique to each of the V. cholerae genomes.
The study of microbial evolution, have shown that it is not only the clonal divergence and periodic selection but also gene exchange that plays a vital role in the process of evolution. Together with gene loss and other genomic alterations, gene acquisition by the mechanism of Horizontal gene transfer has an important role in the adaptive evolution of prokaryotes. In 1990, it was first observed that large blocks of horizontally acquired foreign sequences occur in chromosomes of pathogenic bacteria, and those regions are highly correlated with pathogenicity [1-3]. Some of these blocks of sequences were observed to possess a gene for specific recombinase and sequences having characteristic of integration sites which are the characteristic features of mobile elements. Some others, in spite of being foreign in nature, lacked insertion sequences, recombinase genes and specific att sites, and might have contained only fragments of mobility genes. In the latter case, the mobility sequences were predicted to be lost in the course of evolution after their integration into the bacterial genome . Subsequently, all foreign gene blocks present in pathogenic as well as in non-pathogenic prokaryotic genomes are collectively named in the literature as genomic islands (GI) [4, 5].
These gene blocks determine various accessory functions, e.g., secondary metabolic activities, antibiotic resistance, symbiosis and other special functions related to survival in harsh environmental conditions . These foreign DNA blocks were expected to be associated with the virulence of the pathogenic bacteria and hence the first of these blocks that were proved to be associated with virulence genes of pathogenic bacteria were named as pathogenicity islands .
Properties of PAIs
PAIs carry genes encoding one or more virulence factors: adhesins, toxins, invasins, etc.
They are located on bacterial chromosome or may be a part of a plasmid
They are high in Guanine+Cytosine content, meaning more G+C base pairs than A+T.
They are flanked by direct repeats.
PAIs are associated with tRNA genes, which target sites for the integration of DNA.
PAIs carry functional genes, e.g. integrase, tranposase, or part of insertion sequences.
Represent unstable DNA regions as they may move from one tRNA locus to another.Pathogenicity islands (PAIs) are a distinct class of genomic islands which are acquired by horizontal transfer. They are incorporated in the genome of pathogenic microorganisms but are usually absent from those of non-pathogenic organisms of the same or closely related species. They usually occupy relatively large genomic regions ranging from 10-200 kb and encode genes which contribute to the virulence of the respective pathogen. The known characteristic features of PAI are presented in Box1.
The genus Vibrio belonging to the family Vibrionaceae of bacteria includes several pathogens of human and fish. The most notable member of this family is Vibrio cholerae, the etiological agent of epidemic cholera, which causes a severe and sometimes lethal diarrheal disease. Vibrio cholerae a gram negative flagellated comma shaped bacterium of Gamma Proteobacteria sub-division colonize the mucosal surface of the small intestines of humans causing the diarrheal disease Cholera. Vibrio cholerae is classified into two serotypes: O1 and nonO1. There are two major biotypes of V. Cholera O1, classical and El Tor, and numerous other serogroups. There have been seven major pandemics between 1817 and today. Six were attributed to the classical biotype, while the 7th, which started in 1961, is associated with the El Tor biotype. Vibrio cholerae O395 is a classical O1 serotype strain of the Ogawa biotype. Vibrio cholerae O1 biovar ElTor str. N16961 is an epidemic serogroup of Vibrio cholerae isolated in 1971 in Bangladesh and is distinguished from the classical biotype due to hemolysin production. Vibrio cholerae M66-2 was isolated from the 1937 cholera outbreak in the Makassar area of Indonesia. This is considered to be a pre-7th pandemic isolate. It is well known that the genome organization of these Vibrios [9-11] and several other organisms of related Vibrio species are generally very similar . Genetic studies suggest that the extracellular proteins released by the invading bacteria mediate the pathogenicity of this organism. V. cholerae infection, on the other hand is noninvasive. In this organism, the two major virulence factors CT and toxin corregulated pili (TCP) have been reported to be encoded on mobile genetic elements. The ctxAB genes are encoded on a filamentous bacteriophage CTXª. TCP, an essential colonization factor, was originally designated as part of a pathogenicity island named Vibrio pathogenicity island VPI, but this island has more recently been proposed to be the genome of a filamentous phage, VPIª. In this context the present study has been designed to identify new PAIs using Design-Island [ref] in four completely sequenced genomes of V. cholerae, V. cholerae classical O395, V. cholerae ElTor N16961, V. cholerae MJ1236 (a toxigenic O1 El Tor Inaba strain from Matlab, Bangladesh, 1994 that represents the "Matlab variant" of El Tor) and V. cholerae M66-2 (a pre-7th pandemic isolate) and compare the shared and uniqueness of the identified horizontally acquired regions amongst these four strains of Vibrio cholerae.
Materials and Methods
Acquisition of Sequences
Four different strains of Vibrio cholerae with complete genome sequences were considered for the present study. The chromosomal sequences of all these organisms were downloaded from the ftp server of NCBI (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi).
This is an unsupervised method using Monte-Carlo statistical tests based on randomly selected segments from a chromosome. Design-Island detects segments of bacterial genomes as parts of some GIs. It searches for islands in a prokaryotic chromosome using a probing window that slides over the entire chromosome and also varies in its size. For a given size and a given position of that probing window, the segment of the chromosome captured by the window is compared with the rest of the chromosome by means of some statistical tests. The outcome of each such test is a statistical P-value that lies between 0 and 1, where a low P-value indicates a significant difference between the segment captured by the probing window and the rest of the chromosome.
Algorithm of design-Island runs in two phases, namely first phase and refinement phase. In the first phase of Design-Island, it identifies islands at different locations of the chromosome and to determine the stretches of those islands, it carries out statistical analysis using a probing window. This leads to the identification of some 'putative GIs' having varying sizes and locations in the chromosome that are identifiable with P-values. 'Putative GIs' obtained using the first phase of Design-Island, are always of larger size than what they are supposed to be because of the presence of many 'false positives' (i.e., segments of the genome that are statistically detected as GIs but are not biologically parts of any true island). To reduce the false positives and increase the specificity of the method, a refinement phase is implemented which takes random samples of genomic segments excluding the regions detected in the first phase. The statistical analysis in the refinement phase is very similar to that used in the first phase. The P-values are generated using Monte- Carlo tests carried out at variable locations of the probing window with a fixed size. Some 'putative GIs' are identified in the first phase, and are further refined into smaller segments containing horizontally acquired genes in the refinement phase. The results thus obtained were tabulated using customized perl scripts.
A perl program was developed which uses the final results obtained from the Design-Island to plot a circular map of the chromosome indicating the putative GIs as identified by Design-Island in separate phases using different colors. The algorithm is described in Fig. #.
Design-Island was implemented on the chromosomes of four completely sequenced genomes of V. cholerae, V. cholerae classical O395, V. cholerae ElTor N16961, V. cholerae MJ1236 and V. cholerae M66-2 obtained from NCBI database in order to identify the putative genomic islands in their genomes. Co-ordinates of statistically significant genomic segments were detected by Design-Island. In V. cholerae classical O395, Design-Island detected 64 'putative GIs' in the first phase using P0 = 0.05. The value of P0 in the first phase was relaxed to 0.05, and it was chosen in such a way that most of the horizontally acquired stretches of the genome could be captured by the 'putative GIs' detected in the first phase. The 'putative islands' obtained in the first phase, enabled the generation statistically non-contaminated stretches of the genome (i.e., genomic regions excluding those putative islands). Those stretches were used for random sampling of segments in the refinement phase. After refinement with P0 = 0.001, these islands are broken into 243 statistically significant genomic segments. Similar kinds of results were obtained from the other three V. cholerae strains which is presented in Table #. The percentages of the chromosome covered by the predicted by Design-Island in separate phase are also shown in Table #. From these predicted regions the coding regions were marked out with the protein table available for the individual organism from the NCBI database (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi) as the reference using a customized perl script.
The perl script developed for the visualization of the putative genomic islands (GIs) used the coordinates obtained from the output of Design-Island to generate a circular map of each chromosome of individual organism under study Fig #. The circular map generated by the program represented an individual chromosome of an organism showing the region covered by the predicted region. Each map has two circles representing the putative regions of the same chromosome in separate phases. The inner circle with regions marked in blue represents the predicted regions obtained in the first phase of the run by Design-Island. The outer circle with red regions represents the putative regions as predicted by Design-Island in the refinement phase or the second phase.
The coding regions of the predicted GIs of each genome were sorted out. The proteins present in the predicted GI of V. cholerae O395 were the subjected to organism specific BLASTp search to each of the other three strains of V. cholerae under study in order to understand the relatedness between these strains. This revealed that 58.47% of the protein of V. cholerae O395 present in the predicted GI was shared with V. cholerae Eltor, 57.69% with that of V. cholerae M66-2 and 60.27% with that of V. cholerae MJ1236. The interrelatedness between the four strains is shown in Fig # where a Venn diagram reveals percentage of genes present in the putative GIs of each strain is shared by the other strain. The study also revealed that 6% of the proteins present in the predicted GIs of V. cholerae O395 were unique to this strain and not present in any of the other three strains. The percentage of unique genes present in the putative GIs of each strain is shown in Fig #. The GI genes unique to each V. cholerae strains are shown in Table #.
whereas 1% of the proteins were unique to V. cholerae Eltor and V. cholerae M66-2 and 4% of the proteins were unique to V. cholerae MJ1236
The method uses Monte-Carlo type of simulation, so that the comparison is based on randomly selected segments of the chromosome, which reduces the probability of contaminating the comparison data set and in turn increases the resolution of the method.
There is an extensive literature on the study of GIs in prokaryotic genomes [6, 7]. Genomes of non-pathogenic bacteria were shown to contain foreign gene blocks, which were not associated with virulence.
The occurrence of integrase, transposons, phage mediated genes, etc. in these islands of the prokaryotic genomes is a clear evidence of the presence of horizontally transferred genetic materials in the GIs. As a result the study of GI is of great importance in understanding the evolutionary process of prokaryotic genomes their pathogenicity and other special features, especially with reference to pathogenic organisms like Vibrios. Typical examples are adherence factors, toxins, iron uptake systems, invasion factors and secretion systems.
For a specified value of P0 (0 < P0 < 1), one can determine all the segments of a chromosome that are associated with a P-value less than or equal to P0. Ranges of the 'putative GIs' in terms of their chromosomal locations can be determined using the cut-off value P0 and considering a specified number r of overlapping windows of variable sizes having P-values smaller than or equal to P0. are always of larger size than what they are supposed to be because of the presence of many 'false positives' (i.e., segments of the genome that are statistically detected as GIs but are not biologically parts of any true island). with a fixed overlapping probing window of size w over the regions detected as 'putative GIs' by the first phase of the analysis has been performed in the refinement phase were chosen from the genome. as 'putative GIs' that slides across the chromosome and also varies in its size equal to P0 or smaller