Role Of Bioinformatics In Microbial Studies Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Plant-associated microorganisms are critical to agricultural and food security and are key components in maintaining the balance of our ecosystems. Some of these diverse microbes, which include viruses, bacteria, oomycetes, fungi and nematodes, cause plant diseases, while others prevent diseases or enhance plant growth. Despite their importance, we know little about them at the genomic level. Genome analysis refers to the structural and functional analysis of the DNA at genetic level, the proteins encoded by those genes, as well as noncoding sequences involved in genome dynamics and function. Genomics is a recent conceptual approach for the biological study of micro-organisms, which relies on the analysis of the complete genetic information they contain. Due to powerful automated sequence technologies and advances in bioinformatics, the task of sequencing entire genomes of organisms is increasingly being carried out. Increased availability of plant host genomic sequences, structural and functional genomic analysis of plant-associated microbes will increase the speed of identification of genes involved in host-pathogen interactions and will allow genome-wide approaches to understanding the role of a gene or pathway in the interactions with plants and animals. The exploitation of the full potential of this approach requires the development of novel algorithms, databases and software which are sophisticated enough to draw meaningful comparisons between complete genome sequences and are widely accessible to the bulk of the scientific community. Evolutionary relationalships also can be analyzed by online bioinformatics tools and databases. We describe about the databases like PAMGO, COGS, TIGR, EMBL, etc for the microbial genomes. This article provides information on the computational tools and databases for organizing and extracting biological meaning from the comparison of large collections of genomes of the microbial world.


The microbial occupies every nook and cranny of the globe and consists of a vast and diverse microbial entity, from the deepest depths of the ocean to the highest mountain peak. They live in the water, soil, and air that surround us, on and in the food that we eat, on and within our own bodies also. A microbe (including viruses, bacteria, fungi, protozoa and microalgae) comprise most of the earth's biomass and maintains its environments. They are holding the key to understanding the history of life on Earth. Microorganisms have been present for over 3.8 billion years; we have known about their existence for over 300 years. Yet, incredibly, with some notable exceptions, we still know almost nothing about most of them. Now, with the advent of genomics (the study of an organism's entire DNA complement and its function), we are entering a new era of scientific discovery that holds great promise for understanding the complexities of the microbial world.

The DNA sequence of an organism which is referred to as its genetic blueprint (genome), is used for the analysis of microbial genome data available and these are yielding surprising discoveries. We know that in the microbial genome that has been sequenced, 40 to 50% of the putative genes encode proteins of unknown function, and 20 to 30% encode proteins apparently unique to that species. Genomic analysis also shows us that less than 1% of the microbes on earth have been cultured and studied in the laboratory. Because of the unique properties of microbes already known, and the almost incomprehensible number of microbes on earth yet to be studied, these organisms represent an untapped and extremely valuable resource for the basic sciences, biotechnology, agriculture, human health, energy, and the environment.

Bioinformatics is the application of Information technology to store, organize and analyze the vast amount of biological data, which is available in the form of sequences and structures of proteins, biomolecules and nucleic acids. The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures. Sequences are represented in single dimension where as the structure contains the three dimensional data of sequences. A biological database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated.

Architecture of Biological database

Biological databases can be broadly classified into sequence and structure databases. Sequence databases are applicable to both nucleic acid sequences and protein sequences, whereas structure database is applicable to only proteins. The first database was created within a short period after the Insulin protein sequence was made available in 1956. Three dimensional structure of proteins were studied and the well known Protein Data Bank was developed as the first protein structure database with only 10 entries in 1972. This has now grown into a large database with over 10,000 entries. While the initial databases of protein sequences were maintained at the individual laboratories, the development of a consolidated formal database known as SWISS-PROT protein sequence database was initiated in 1986, which now has about 70,000 protein sequences. This value is miniscule as the sequences are from more than 5000 model organisms, a small fraction of all known organisms. These huge varieties of divergent data resources are now available for study and research by both academic institutions and industries.


Based on the research of Woese and others in the 1980s and 1990s, most biologists divide all living organisms into 3 domains:

Domain Archaea

Domain Bacteria

Domain Eucarya

Domain Archaea: This domain consists of two phyla

Phylum Crenarchaeota: They have been recently discovered and contain thermophylic and hyperthermophilic sulfur-metabolizing archaea. Members of this phylum are inhibited by sulfur and grow at lower temperatures.

Phylum Euryarchaeota: They contain primarily methanogenic archaea, halophilic archaea, and thermophilic, sulfur-reducing archaea.

Domain Bacteria: This domain has been divided into According to 2nd edition of Bergey's Manual of Systematic Bacteriology into 23 phyla. Among these phyla we are discussing the feature of some notable phyla.

Phylum Aquiflexa: The earliest "deepest" branch of the Bacteria contains genera Aquiflexa and Hydrogenobacter that can obtain energy from hydrogen via chemolithotrophic pathways.

Phylum Proteobacteria: This is the largest group of gram-negative bacteria extremely complex group, with over 400 genera and 1300 named species .All major nutritional types are represented: phototrophy, heterotrophy, and several types of chemolithotrophy. Sometimes they are called "purple bacteria," although very few are purple. This term refers to a hypothetical purple photosynthetic bacterium from which the group is believed to have evolved.

Phylum Proteobacteria: This phylum is divided into 5 classes -






This phylum includes photosynthetic genera such as Rhodospirillum (a purple non-sulfur bacterium) and Chromatium (a purple sulfur bacterium); Sulfur chemolithotrophs; like Thiobacillus and Beggiatoa; Nitrogen chemolithotrophs (nitrifying bacteria), like Nitrobacter and Nitrosomonas; other chemolithotrophs, such as Alcaligenes, Methylobacilllus, Burkholderia.

The family Enterobacteriaceae: The "gram-negative enteric bacteria," which includes Escherichia, Proteus, Enterobacter, Klebsiella, Salmonella, Shigella, Serratia.

The family Pseudomonadaceae; which includes Pseudomonas and related genera.

Medically important Proteobacteria include Haemophilus, Vibrio, Camphylobacter, Helicobacter, Rickessia, Brucella

Phylum Firmicutes : "Low G + C contents, gram-positive" bacteria

Divided into 3 classes

Class I - Clostridia; includes like Clostridium and Desulfotomaculatum

Class II - Mollicutes; bacteria in this class cannot make peptidoglycan and lack cell walls; includes Mycoplasma and Ureaplasma

Class III - Bacilli; includes genera Bacillus, Lactobacillus, Streptococcus, Lactococcus, Geobacillus, Enterococcus, Listeria, Staphylococcus.

Phylum Actinobacteria: "High G + C contents, gram-positive" bacteria , Includes genera like Actinomyces, Streptomyces, Corynebacterium, Micrococcus, Mycobacterium, Propionibacterium

Phylum Chlamidiae: Small phylum containing the genus Chlamydia

Phylum Spirochaetes: The spirochaetes are characterized by flexible, helical cells with a modified outer membrane (the outer sheath) and modified flagella (axial filaments) located within the outer sheath, Important pathogenic genera include Treponema, Borrelia, and Leptospira

Phylum Bacteroidetes includes genera like Bacteroides, Flavobacterium, Flexibacter, and Cytophyga; Flexibacter and Cytophyga are motile by means of "gliding motility"

Domain Eucarya

The domain Eucarya is divided into four kingdoms by most biologists:

Kingdom Protista- including the protozoa and algae

Kingdom Fungi- the fungi (molds, yeast, and fleshy fungi)

Kingdom Animalia- multicellular animals

Kingdom Plantae -multicellular plants


The explosion of freely-available biological information combined with Comprehensive analytical tools of bioinformatics has provided unique opportunities for students to explore molecular structure and function of microbes. Microbes have profound impacts on soil and aquatic ecosystems. This range from cycling of carbon, nitrogen and other major nutrients, to direct positive (mutualistic) and negative (pathogenic) impacts on plants, animals and humans. If biologists and computer scientists learn to collaborate more effectively, their joint efforts will lead to a much better science of bioinformatics. We can explore the new evolutionary trades by using the different datasets and tools of bioinformatics tools. The Microbial studies can be empowered by the followings:

Evolution ( Phylogenetic and population genetics),

Biophysics (thermodynamics, quantum mechanics, statistical mechanics, kinetics)

Information science (information theory, computational linguistics, coding theory)

The use of "bioinformatics" in the building of global databases in microbiology aims at pinpointing the key technologies and necessary building blocks that should make it possible to

Build an accumulative knowledge repository that captures the reams of experimental data and meta-data about micro-organisms.

To develop general data mining tools for knowledge discovery within this data-

rich environment.

To establish dynamically updated and flexible portals upon the observed bacterial diversity and related biotechnological innovations.

Valorizing newly discovered insights as new applications or end-products.

One of the primary goals of the bio-informatics sessions is to streamline some of these pioneering initiatives and mould the different insights they have produced into a more integrative approach.

The following databases are the major resources of microbial studies in biological research.

ComBase: ComBase is a relational database of Predictive Microbiology information. ComBase contains thousands of data sets that describe the growth, survival and inactivation of bacteria under diverse environments relevant to food processing operations. The ComBase (1), a combined database of microbial responses to food environments was preceded by two independent, but similar initiatives on the two sides of the Atlantic. The Ministry of Agriculture Fisheries and Food in the United Kingdom initiated, in 1988, a coordinated programme to collect data on the growth and death of bacterial pathogens. Those data served as the base on which the first validated, commercialized predictive package, Food Micro Model was built. The task of supporting these developments was taken over, when by the UK Food Standards Agency (FSA), after its establishment.

MRV (Microbial Response Viewer): Database of microbial response to food environment derived from ComBase.

MBGD: is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD (2) is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison.

BioloMICSNet: This Software facilitates an indepth analysis of the available databases. Several ways of searching can be used. The first is a simple text query tool that will search for a given text in all text fields of the selected database and table. The second tool facilitates retrieval of some records using the unique identifier of the record, such as the collection accession number of a given strains or specimen. A third facility allows searches by species and/or genera names. An advanced searching tool is also available to facilitate performance of complex enquiries.

VMD (VBI Microbial Database): It is an integrated resource that includes community annotation features, toolkits, and resources to perform complex queries of biological information (3). It contains genome sequence and annotation data for the plant pathogens Phytophthora sojae and Phytophthora ramorum. The purpose of the database is to make the recently completed genome sequences of these pathogens as well as powerful analytical tools widely available to researchers in one integrated resource.

NIAS (National Institute of Agro Biological Sciences): This Institute aims to make a leap forward in both basic and pioneering research and technological development, mainly in the field of biotechnology.

ACBR Microbial database(Austrian Center of Biological Resources and Applied Mycology )

Many Microbe Microarrays Database (M3D): It is a resource for analyzing and retrieving gene expression data for microbes. The database currently contains Affymetrix expression compendia for Escherichia coli, Saccharomyces cerevisiae, and Shewanella oneidensis (4). This is an internet web site designed to bring useful and interesting microbiology informational resources. The Canary Database is a compilation of curated peer-reviewed research articles related to the use of animals as sentinels of human health hazards. This database contains information added by trained curators in addition to bibliographic records from MEDLINE and other well-known databases. The database includes studies of wildlife, companion, and livestock animals, where either the exposure or the health effect could be considered potentially relevant to human health (5).

Uses of Canary Database:

Find out whether a cause and effect relationship between an environmental hazard and a health outcome has been studied in animal populations

Find out what is known about a particular disease reservoir for an infectious agent

Find out how investigators have used different study methodologies

Identify knowledge gaps related to animal sentinel health events

PAMGO (Plant Associate Microbe Gene Ontology Group) interest group was formed to develop new gene ontology (GO) terms describing the various processes, functions and cellular components related to microbe-host interactions. Plant-associated microbes have evolved similar mechanisms to evade, neutralize or suppress defense systems of their plant hosts and obtain nutrients. Such similarities can only be discovered if a controlled vocabulary is set in place to describe these processes amongst diverse microbe-host interactions. In a multi-institutional collaborative effort, this group is working on developing new GO terms and relationships for gene products implicated in plant interactions in the bacterial pathogens, Erwinia chrysanthemi, Pseudomonas HYPERLINK ""syringaeHYPERLINK "" pv tomatiso and Agrobacterium tumefaciens, the fungus Magnaporthe grisea, the oomycetes Phytophthora sojae and Phytophthora ramorum and the nematode Meloidogyne hapla.

Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

The Comprehensive Microbial Resource (CMR) is a free tool that allows researchers to access all of the publicly available bacterial genome sequences completed to date (6). For each genome not sequenced at JCVI (j. Craig Venter Institute), two kinds of annotation are displayed: the Primary annotation taken from the genome sequencing center and the JCVI annotation generated by an automated annotation process at JCVI.

BiotechPro: This is database for microbiologically synthesized products of biotechnological value (7).

TIGR (The Institute of Genome Research): Genome Projects are a collection of curated databases containing DNA and protein sequence, gene expression, cellular role, protein family, and taxonomic data for microbes, plants and humans.

The EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource. It is the main source for DNA while RNA sequences are direct HYPERLINK ""submissions from individual researchers, genome sequencing projects and patent application. The database is produced in an international collaboration with GenBank (USA) and the DNA Database of Japan (DDBJ). Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis. The EMBL nucleotide sequence database is part of The Protein and Nucleotide Database Group (PANDA).



Monitoring completed and ongoing genome projects

Genomes Online Database (GOLD) Provides access to lists of complete and ongoing genome projects from prokaryotes and eukaryotes,(

Primary international databases of complete genome sequences

DNA Database of Japan (DDBJ) Genomes at DDBJ in the Genome Information Broker system (

European Bioinformatics Institute (EBI) Genomes at EBI (

National Center for Biotechnology Information (NCBI) Genomes at NCBI in the Entrez Genomes system (

Comparative genomic databases

KEGG: Kyoto Encyclopedia of Genes and Genomes Enzyme and pathway information about complete genomes (

Comprehensive Microbial Resource (CMR) Provides access to a wide range of information and analyses about all complete bacterial genomes (

Integrated Microbial Genomes (IMG) Facilitates the visualization and exploration of genomes from a functional and evolutionary perspective (

Microbial Genome Database for Comparative Analysis (MBGD) Provides orthologous identification, paralogue clustering, motif analysis and gene order data (

Virulogenome Access to complete and incomplete genomes, including Artemis

applet and ACT comparisons(

Pathway and protein interaction databases

BioCyc A collection of curated databases each of which describes the genome and metabolic pathways of a single organism (

MetaCyc A database of non redundant, experimentally elucidated metabolic pathways (

STRING A database of known and predicted protein-protein interactions



The development of automated approaches to gathering molecular biology data has had a dramatic impact on the field of microbial ecology. The bioinformatics tools that are necessary to analyze and interpret large datasets are having a similar dramatic impact on the productivity of microbial ecology research. Automation of queries alone saves countless hours of researcher time. These bioinformatics tools may also open new avenues of research as users mine the microbial community dataset for patterns that suggest the role of specific microbial populations in ecosystem processes. Sharing of microbial community datasets with researchers examining microbial populations in different environments may enhance inquiry related to microbially mediated biological system and process, thus strengthening the case for developing metadata standards among such datasets.


Information management and bioinformatics technologies have greatly aided ecological research efforts. The growing synergy between the disciplines of natural and computer sciences is changing the landscape of data collection, management and analysis. The discipline of microbial ecology lags behind traditional ecological disciplines in this regard. While microbial ecology researchers are comfortable with the technology necessary to gather and analyze their data, there are currently few informatics resources available to study them. The integration of molecular biology and environmental data can serve as framework to improve analysis of microbes and interaction to other entities and may inspire future research integrating environmental microbiology datasets with large-scale studies.