Programming Languages For Bioinformatics Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Bioinformatics is a cross-disciplinary field in biology that discover and optimize methods to analyze biological data. Developing software and tools to generate useful biological knowledge is a major activity in bioinformatics [1]. Although bioinformatics uses areas of biology, physics, chemistry, math, engineering and computer science to handle biological data, software tools and high-performance computing in computer science are the keys to solve bioinformatics problems. Bioinformatics is not a simple branch of biology or information science, it is a multi-disciplinary dynamic integration. Its state-of-the-art information technology and mathematical studies the phenomenon of life, it will help people gradually understanding the origin of life, evolution, the nature of the genetic and decipher hidden information in the DNA sequences. Also, it will help us to reveal the molecular basis of human physiology and pathology, for the prediction of human disease, diagnosis, treatment and prevention of the most reasonable and effective ways and means. The development of bioinformatics will revolutionize the life sciences. It results not only from the related basic disciplines huge role but will have a huge impact on medicine, health, food, agriculture and other industries, and even lead to a new industrial revolution as well. Therefore, governments and industry are extremely seriously, and invested a lot of money. Many commercial organizations involved, the development of bioinformatics has injected great vitality.

Perl was designed by Larry Wall who integrated regular expressions alone with combinations of features from C, sed, awk, shell scripting and many other programming languages. It is as powerful as C yet as easy to use as awk, sed.

Python is extremely useful for bioinformatics because of its numerical handling abilities through the Numerical Python project [3], and it is remarkably easy with syntax which makes it friendly with non-computer-science major people.

2 How do they work

2.1 Perl and Bioperl

"The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 years into the most comprehensive library of Perl modules available for managing and manipulating life-science information. [4]" Bioperl provides a user-friendly, stable, and consistent programming interface and modules for repeatedly usage to reduce otherwise complex tasks to only a few lines of code [4]. By using the Bioperl, it allows users to create programs with the power of Perl yet concise and explicit to use modules from Bioperl to perform subproblems effectively.

Bioperl was initially created because of Perl's abilities with strings. "Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. [5]" Data like DNA sequences, RNA sequences even protein sequences are strings.

2.2 Python and Biopython

Similar with Perl, Python is often referred as scripting language which is compiled in an intermediate representation without creating an intermediate file then interpreted [8]. The open-source Biopython project is an international collaboration of volunteer developers that develops libraries for Python to facilitate common tasks in bioinformatics [7].

3 Features

3.1 Bioperl

Bioperl as initially designed to use object-oriented methodology so as to create clean, generic, and reusable modules to represent data structures and operations common to the life sciences [4]. Table 1 shows the major modules in Bioperl and their usage.



Bio::Seq Sequences and their properties

Bio::SeqIO Sequence data input/output

Bio::Index Flat-file sequence database indexing and retrieval

Bio::DB Remote database access for sequences and references via HTTP

Bio::DB::GFF SQL GFF database for DAS and GBrowse backends

Bio::SeqFeature Annotations or features that have a sequence location

Bio::Annotation Generic annotations such as Comments and References

Bio::AlignIO, Bio::SimpleAlign Multiple sequence alignments and their Input/Output

Bio::LiveSeq, Bio::Variation Sequence variations and mutations

Bio::Search, Bio::SearchIO Sequence database searches and their Input/Output

Bio::Tools Miscellaneous analysis tools

Bio::Tools::Run Wrapper for executing local and remote analyses

Bio::Tree, Bio::TreeIO Phylogenetic trees and their Input/Output

Bio::Structure Protein structure data

Bio::Map, Bio::MapIO Biological maps and their Input/Output

Bio::Biblio, Bio::DB::Biblio Bibliographic References and Database retrieval

Bio::Graphics Graphical displays of sequences

Table 1. Major Bioperl Module Groups [4]

3.2 Biopython

Biopython's core sequence representation is seq object which behaves very much like a Python string alone with the abilities that allowing explicit declaration of a protein sequences and some key biologically relevant methods [7]. It is designed to avoid code duplication and increase efficacy for programming. Biopython contains parsers for a large number of file formats such as BLAST, FASTA, Swiss-Prot, PubMed, KEGG, GenBank, AlignACE, Prosite, LocusLink, and PDB which gives users abilities to interact with tools to handle clustering gene expression data [2].

4 Performances

To test the performances of Perl and Python in regular bioinformatics tasks, the alignment problem should be a good way to do it. "Speed comparison of the global alignment algorithm using a gap penalty of 10 implemented in Perl and Python. The programs were run on Linux and Windows platforms. Two DNA sequences of 3216 bp and 3217 bp were used. [9]" According to Figure 1 and previous sections, we can find out that: Perl emphasized support for normal application-oriented tasks, through the built-in regular expression, file scanning and report generating features, on the other hand, Python emphasis support for common programming methods object-oriented















Figure 1 Speed comparison of the global alignment program [8]programming [9].

5 Conclusion

It is not hard to come to a conclusion that both Perl and Python are ideal programming languages for bioinformatics. They are designed with different features to perform different tasks. Python is better for object-oriented programming and Perl is good at application-oriented tasks. We cannot simply judge which language is better yet we can have an idea of best choice to specific problem.