Bioinformatics Is A Multi Field Discipline Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Bioinformatics is based largely around the use of computational methods to understand biological processes through interpretation and manipulation of biological information. The use of computational methods with experimental information in understanding biological information has been seen as early as 1961 (Ingram, 1961), and as such bioinformatics is a relatively old discipline. The period from 1980 to the 1990s saw one of the largest leaps in the development of bioinformatics. This development occurred in tandem with developments in computational methods through advances in the main fields of bioinformatics including sequence analysis, molecular databases, protein structure prediction, and molecular evolution (Ouzounis & Valencia, 2003). As the cost of sequencing has become less expensive, a rapid increase has been seen in the amount of biological information available and as such bioinformatic tools have developed to interpret this data.

It became necessary to develop programs such as FASTA (Pearson & Lipman, 1988) and BLAST (Altschul et al., 1990) for the analysis of newly sequenced biological information. The increasing amount of sequence data available due to sequencing projects led to the development of databases in which to hold this information collectively for access and use; and the founding of two major nucleic acid databases EMBL Data Library and GenBank, and the development of computational methods with which to search these databases. Along with the development of these two major databases, many others were founded, as well as protein databases based on protein sequences and those dealing with protein structures. In recent times, efforts have been made to standardise databases and develop composite databases.

Upon obtaining a bacterial gene/genome sequence it is possible to use programs such as Glimmer or GeneMark (Tettelin & Feldblyum, 2009) to predict gene location within that sequence. By interrogation of databases of characterised proteins such as the UniProt Knowledgebase (UniProtKB), the function of predicted genes can then be predicted by homology with characterised proteins. Sequences can then be annotated using FASTA or BLAST which utilise homology-based searches that perform local sequence alignments between sequences to find matches in databases to the sequence being queried and return homologous sequences for identification of similar sequences. From this information, depending on the degree of similarity the newly sequenced data can be annotated with its predicted structure and function. Additional searches can be made to support their predicted function with databases such as those of hidden Markov models (HMMs) (Tettelin & Feldblyum, 2009). This has allowed for greater determination of microbial activities such as elucidation of virulence mechanisms of pathogenic bacteria or identification of microbial metabolism mechanisms, as seen in a successful study of reconstructing amino acid biosynthesis pathways in Escherichia coli (Bono et al., 1998).

Since the original whole genome sequencing of Haemophilus influenza in 1995 (Fleischmann et al., 1995), there have been 1256 complete published Genome projects including 1048 bacterial, 127 eukaryal (including that of the human genome; Venter et al., 2001), and 81 archaeal genome sequences, with a further 5720 ongoing genome sequencing projects (Liolios et al., 2010). Bioinformatic tools have not only allowed for the rapid development of genomic sequencing projects but also the interpretation of the data made available through these ventures. The availability of genomic information for a wide range of bacterial species and increasingly, multiple strain sequences has allowed for the development of fields such as comparative genomics which allows the elucidation of species- and strain-specific features. Comparative studies have led to the unveiling of novel virulence genes, and have shown the molecular basis of pathogen-host interactions and host-specificity, comparative methods are largely computational and exhibit a significant impact of bioinformatics on microbiology (Fraser & Rappuoli, 2004).

An excellent example of the advantages of a comparative analysis is seen in that of three Bordetella species. The molecular basis of host specificity of each species was uncovered as well as that of virulence factors such as the pertussis toxin which is produced exclusively by B. pertussis (Parkhill et al., 2003). A similar study involving the bacterial pathogen group A Streptococcus revealed novel virulence factors through comparative genomic studies (Musser & Shelburne III, 2009). Another beneficial comparative analysis involved the discovery of the molecular basis of the ability of particular strains of Histophilus somni to incorporate sialic acid on their lipooligosaccharide in a case of host antigenic mimicry which helps this microorganism avoid the host immune system, whereas the genes encoding the transferase necessary for this are either absent or truncated in strains without this ability (Sandal & Inzana, 2009). These discoveries enabled by comparative genomics through bioinformatic approaches have led to increased understanding of microbial behaviours and relationships.

The whole genome analysis and comparison which has been made possible by bioinformatics has also led to interesting revelations in terms of the evolutionary history of microorganisms with molecular phylogenetics in that the traditional method of inferring evolutionary relatedness from point mutations in the 16SrRNA genes was found to be inefficient, as the evolutionary tree produced from this method varies when using other highly conserved genes as the basis (Pennisi, 1998) and as such may result in a total restructuring of the ‘tree of life’. Some researchers are attempting methods by which to compare whole genomes for the elucidation of evolutionary history (Snel et al., 2005). It has become clear that methods other than multiple-sequence alignments based on nucleotides or amino acids are necessary due to evolutionary mechanisms such as gene duplication or loss such that genomes from the same species can vary by as much as 25%, as seen in the comparison of the genomes of Escherichia coli K-12 and E. coli O157:H7 (Perna et al., 2001). One such method is the use of genome trees to exhibit genome evolution and phylogenetic relationships (Snel et al., 2005).

The use of bioinformatics to discover these pathogen-specific features has allowed researchers to identify target areas for anti-microbial drug development, and has led to development of the method of reverse vaccinology. In reverse vaccinology novel targets for vaccine development can be identified entirely by computational analysis, as through sequence analysis all proteins encoded by a pathogen can be predicted and this is now a standard approach in any vaccine development project (Fraser & Rappuoli, 2004). Currently, a vaccine developed through the use of reverse vaccinology is in clinical trials against serogroup B of Neisseria meningitidis (Rappuoli, 2001) with promising preliminary results (Fraser & Rappuoli, 2004), thereby showing the potential of bioinformatics in applications such as reverse vaccinology. Structural bioinformatics can also prove useful to reverse vaccinology. With the knowledge of protein structure in some cases the protein function can be inferred. This can then aid in structure-based drug design by analysing the molecular structure of both the target protein and the drug, as well as analysing their likely interaction (Goodsell et al., 1998). Although structural bioinformatics has become a field on its own, it complements many areas of microbiology such as recombinant protein production in which visualising the protein structure can aid in protein production optimisation (Yokoyama et al., 2000).

Bioinformatics is constantly at the leading edge of genomics and requires constant development to meet the needs of these operations. High-throughput sequencing mechanisms based on the shotgun sequencing approach have rapidly increased the speed of sequencing genes and genomes, but in 2005 new sequencing technologies known as short read sequencing (SRS) technologies (such as the 454-Roche Genome Sequencer FLX) became available which do not rely on the same techniques or chemistry as the shotgun approach (Margulies et al., 2005). Some disadvantages of these new technologies, such as E. coli being an unsuitable vector for some sequences, can be partially overcome by combining platforms but the tools with which to do this are still being developed in many cases. The main outlying disadvantage is the decreased read length from these sequencing platforms with some as short as 35-40 nucleotides compared to the 800 nucleotide read length from Sanger sequencing (Tettelin & Feldblyum, 2009). These short read lengths present many problems in genome assembly as it is hard to differentiate repeat regions, and as such bioinformatic tools must be developed to process this raw data. It has been seen that programs are being developed for the resequencing of SRS data but these are inefficient with long process times showing that the key requirement is the development of novel algorithms for handling this data; this is the focus of computational bioinformatics (Pop & Salzberg, 2008).

Bioinformatics has developed as one of the most important tools with which to analyse, utilise, and manipulate biological information and has permeated many disciplines of microbiology. It has seen a diverse range of applications starting with methods by which to store sequence information and has recently been used in development and design of vaccines computationally, modelling protein and drug structure, inferring evolutionary relationships of microorganisms, and enhancement of sequencing methods, and has aided in the elucidation of microbial metabolic and virulence mechanisms. Currently, bioinformatics is developing to deal with a rapid onslaught of raw biological data and will continue to develop for the foreseeable future. In the past 20 years bioinformatics has grown enormously in its range of applications and as such its potential for growth is immense with an untold range of future applications.