Exploring Diseases Through Metabolomics Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Biology is the field of sciences that is increasing day by day; it is in the midst of experimental change. The discipline of Metabolomics is moving from being a data poor science to a data rich science. Besides transcriptomics and proteomics, Metabolomics has emerged as third major path of functional genomics in the field of sciences. Just as genomics is the omics for DNA sequence analysis, proteomics is the omics approach to understand the structure and function of protein in cell, Metabolomics is the omics approach to understand cell and systems biology level, combined with information obtained on transcriptome and proteome, this would lead to nearly complete molecular picture of cell, its environmental state, effect of external condition on cell's product at a given time. [1]

Protein is very dynamic in nature and thus changes very rapidly in accordance with its external environment, this is the reason that after completion of gene sequencing projects, scientific interest is shifting to the investigation of the proteome and metabolome. Broadly speaking, the metabolome can be considered as the dynamic complement of metabolites formed by or found within a cell type, tissue, body fluid or organism. [1]

The OMIC world:



Under a given set of conditions, Metabolomics study the global metabolite profiles in a system (cell, tissue, or organism). Metabolites are the small organic, chemical molecules present in the cell. Due to the diverse chemical nature of metabolites, the analysis of the metabolome become challenging. Metabolites are the result of the interaction of the genome with its environment and are not merely the end product of gene expression but also form part of the regulatory system in an integrated manner. Metabolite profiling studies are the basis of Metabolomics, but now it is becoming a popular field of study which is rapidly expanding. [2]

Metabolomics is the new 'Omics' joining genomics, transcriptomics and proteomics to understanding system biology. It is a large-scale study of all metabolites present in cell, tissue or organs usually by high throughput screening. [2,3,4] Metabolomics identify and quantify the complete set of metabolites present in a cell or tissue at a particular set of time and conditions. It is a key aspect to phenotype hence, describing the distribution of metabolites is next logical step in elaboration of functional genomics and may be the best and most direct measure of cellular morphology. [5,6]

Metabolomics is comprised of two words: Metabolome and Omics.

Metabolome or Small Molecule inventory (SMI) is defined by entire complement of low molecular weight, non-peptide metabolite with in a cell or tissue or organism at a particular physiological rate. It defines metabolic phenotype thus is an important biochemical manifestation and useful tool for functional genomics. Another definition states that metabolome consists only of those native small molecules (definable non polymeric compound) that are participant in general metabolite reactions and that are required for maintenance, growth and normal function of a cell.

"Omics" technologies are based on comprehensive biochemical and molecular characterization of an organism, tissue or cell type. Omics is a high-through put screening based on biochemical and molecular characterization of an organ, tissue, or cell type [7,8]. Metabolomics represents the logical progression from large-scale analysis of RNA and proteins at the systems level [9].

Metabolomics deals with the quantification of all or a substantial fraction of all metabolites within a biological sample and simultaneously identifying and quantifying their respective classes of biomolecules such as mRNAs, proteins and metabolites. While the genome is representative of what might be proteome is and what it is expressed; it is the metabolome that represent the current status of the cell or tissue. To understand the basic metabolism and chemistry of metabolites, biochemical pathways should be first understood [10]. Measurement of metabolite provides basic information about biological response to physiological or environmental changes and thus improves the understanding of cellular biochemistry as networks of metabolite feedback regulate gene and protein expression and mediate signal between organisms. Metabolomics allows a shift from hypothesis driven research to the analysis of system-wide responses, especially when it is integrated with other profiling technologies.

At the analytical level Metabolomics rely on comprehensive profiling of large number of gene expression products known as transcriptomics, proteomics and metabolomics.

Metabolomics is a direct approach to reveal the function of genes involved in metabolic processes and gene-to-metabolite networks. It offers a quick way to elucidate the function of novel genes and play important role in future plant, nutrition and health, drug toxicity etc. Metabolism is the key aspect of phenotype, hence describing the distribution of metabolites in next logical step in elaboration of functional genomics. It is useful wherever an assessment of change in metabolite concentration is needed. In order to elucidate an unknown gene function, genetic alteration is introduced in system by analyzing phenotyping effect of such a mutation i.e. by analyzing the metabolome functions may be assigned to respective gene [10].

Metabolites are the result of interaction of system's genome with its environment and are not merely end product of gene expression but also from part of regulatory system in an integrated manner and thus can define biochemical and phenotype of a cell or tissue. Thus its quantitative and qualitative measurement can provide a broad view of biochemical status of organism; that can be used to monitor and assess gene function.


Exhaustive work has been done on genomics, proteomics and transcriptomics, which allowed establishing global and quantitating mRNA expression profile of cells and tissues in species for which the sequence of all genes is known. Now question which arises is why Metabolomics when transcriptome, genome and proteome are so popular?

Probable reason for this may be: any change in transcriptome and proteome due to increase in RNA do not always correspondence to alteration in biochemical phenotype and increase mRNA do not always correlated with increased protein level. This can be better understood that Translated protein may or may not be enzymatically active; thus it can be said that transcriptome and proteome do not correspondence to alteration in biochemical phenotype. Identification of mRNA and protein is indirect and yield only limited information.

Another reason might be: If quantification of metabolite is known then long process like to know DNA and protein sequence, micro array, 2-D-Gel, Electrophoresis need not to be done. Thus, it is inferred that metabolome provide the most functional information of Omics technology.

Unlike transcripts and proteome, metabolite shares no direct link with genetic code and is instead products of concerted action of many networks of enzymatic reactions in cell and tissue. As such, metabolites do not readily tend themselves to universal methods for analysis and characterization.

Metabolome data has twin advantage in systematic analysis of gene function; that metabolites are functional cellular entities that vary with physiological content and also the number of metabolites is far fewer than the number of genes or gene product. For this reason, Metabolomics requires the exploitation of knowledge of experimentally characterized gene in elucidation of function of unstudied gene. This may be achieved by comparing the change in cells metabolite profile that is produced by deleting a gene of unknown function with a library of such profiles generated by individually deleting genes of unknown function. Strategies for identifying the function of unknown genes on the basis of metabolomic data have been proposed. Silent phenotypes can be revealed by significant changes in concentration of intercellular metabolites. FANCY approach is capable of revealing the function of gene that does not participate directly in metabolism or its control. An advantage of FANCY approach is that it assigns cellular rather than molecular function.

Metabolite phenotypes are used as the basis of discriminating between plants of different genotypes or treated plants. Metabolic composition of a cell or tissue influences the phenotype and it is the most appropriate choice for functional genomics and to use the fluxes between metabolites as the basis for defining a metabolic phenotype is a matter for debate but there is increasing evidence, for example from investigations of transgenic plants that metabolomic analysis is a useful phenotyping tool. Moreover, the value of a metabolic phenotype, however

defined, is greatly increased by the possibility of correlating the data with the system-wide analysis of gene expression and protein content.

The major challenge faced by metabolomics is unable to comprehensively profile of all metabolites. Plants have enormous biochemical diversity, which is estimated to exceed 200,000 different metabolites and therefore large-scale comprehensive metabolite profiling meets its greater challenge. Metabolites are not linear polymers composed of a defined set of monomeric units but rather constitute a structurally diverse collection of molecule with widely varied chemical and physical properties.

The chemical nature of metabolites ranges from ionic, inorganic species to hydrophilic carbohydrate, hydrophilic lipids and complex natural products. The chemical diversity and complexity of metabolome makes it extremely challenge to profile all of metabolome simultaneously. To find changes in metabolic network that are functionally correlated with the physiological and developmental phenotype of the cell, tissue or organism is the bottleneck of metabolomics.

If one general extraction and analytical system is used it is likely that many metabolites will remain in plant matrix and will not be profiled. Analytical variance (the coefficient of variance or relative standard deviation that is directly related to experimental approach), Biological variance (arises from quantitative variation in metabolite levels between plants of same species grown under identical or as near as possible identical conditions), Dynamic range (concentration boundaries of an analytical determination over which instrumental response as a function of analyte concentration is linear) represent the major limitations of resolution of Metabolomics approach.

Metabolome analysis can be roughly grouped in to four categories, which require different methodologies for validation of results. For the study of primary effects of any alteration, analysis can be restricted to a particular metabolite or enzyme that would be directly affected by abiotic or biotic perturbation. This technique is called metabolite target analysis and is mainly used for screening purpose. Sophisticated methods for the extractions, sample preparation, sample clean ups, and internal references may be used, making it much more precise than other methods [11]. Metabolic fingerprinting classifies samples according to their biological relevance and origin and used for functional genomics, plant breeding and various diagnostic purposes. In order to study the number of compounds belonging to a selected biochemical pathway, metabolite profiling is employed. The term metabolite profiling was coined by Horning and Horning in 1970, defined as 'quantitative and qualitative analysis of complex mixtures of physiological origin'. It has been employed for the analysis of lipids [12], isoprenoids [13], saponins [14], carotenoids [15], steroids and acids [16]. Only crude sample fractionation and clean-up steps are carried out [11]. Next step in metabolome analysis is to determine metabolic snapshots in a broad and comprehensive way, widely known as metabolomics. In this, both sample preparation and data acquisition aimed at including all class of compounds, with high recovery and experimental robustness and reproducibility.

Metabolomics has been developing as an important functional genomic tool. For continued maturation of it, following objectives need to be achieved:

- Improved comprehensive coverage of plant metabolome.

- Facilitation of comparison of results between laboratory and experiments

- Enhancement of integration of metabolomics data with other functional genomic strategies.

Application of Metabolomics

Since the metabolome is closely tied to the genotype of an organism, its physiology and its environment (what the organism eats or breathes), metabolomics offers a unique opportunity to look at genotype-phenotype as well as genotype-envirotype relationships. Metabolomics is increasingly being used in a variety of health applications including pharmacology, pre-clinical drug trials, toxicology, transplant monitoring, newborn screening and clinical chemistry. However, a key limitation to metabolomics is the fact that the human metabolome is not at all well characterized.[1]

The Human Metabolome Project (HMP)

Unlike the situation in genomics, where the human genome is now fully sequenced and freely accessible, metabolomics is not nearly as developed. There are approximately 2900 endogenous or common metabolites that are detectable in the human body. Not all of these metabolites can be found in any given tissue or bio-fluid. This is because different tissues/bio-fluids serve different functions or have different metabolic roles. To date, the HMP has identified and quantified (i.e. determined the normal concentration ranges for) 309 metabolites in CSF, 1122 metabolites in serum, 458 metabolites in urine and approximately 300 metabolites in other tissues and bio-fluids. Clearly more concentration data would be desirable and this is one of the long term goals of the HMP and other affiliated metabolomic projects around the world.

The Human Metabolome Project is a $7.5 million Genome Canada funded project launched in January 2005. The purpose of the project is to facilitate metabolomics research through several objectives to improve disease identification, prognosis and monitoring; provide insight into drug metabolism and toxicology; provide a linkage between the human metabolome and the human genome; and to develop software tools for metabolomics. 

The project mandate is to identify, quantify, catalogue and store all metabolites that can potentially be found in human tissues and bio-fluids at concentrations greater than one micromolar. This data will be freely accessible in an electronic format to all researchers through the Human Metabolome Database. In addition, all compounds will be publicly available through Human Metabolome Library. 

Already more than 800 compounds have been identified and by end of this year it is expected that more than 1400 metabolites will have been identified, quantified and archived into web-accessible databases and stored in -80°C freezers. However, the Human Metabolome Project is only mandated to provide chemical data and chemical compounds to the scientific community. It does not have the funding or the resources to use these "raw materials" for disease identification and characterization. Indeed the intent of the Human Metabolome Project is to be an enabler of future metabolomic research, just as the Human Genome Project has been an enabler of current genomic research.[17]

Exploring disease through metabolomics.

Metabolomics approaches provide an analysis of changing metabolite levels in biological samples. In the past decade, technical advances have spurred the application of metabolomics in a variety of diverse research areas spanning basic, biomedical, and clinical sciences. In particular, improvements in instrumentation, data analysis software, and the development of metabolite databases have accelerated the measurement and identification of metabolites. Metabolomics approaches have been applied to a number of important problems, which include the discovery of biomarkers as well as mechanistic studies aimed at discovering metabolites or metabolic pathways that regulate cellular and physiological processes. By providing access to a portion of biomolecular space not covered by other profiling approaches (e.g., proteomics and genomics), metabolomics offers unique insights into small molecule regulation and signaling in biology. In the following review, we look at the integration of metabolomics approaches in different areas of basic and biomedical research, and try to point out the areas in which these approaches have enriched our understanding of cellular and physiological biology, especially within the context of pathways linked to disease.[18]

Applications of metabolomics in drug discovery and development

Metabolomics is a relatively new field of 'omics' technology that is primarily concerned with the global or system-wide characterization of small molecule metabolites using technologies such as nuclear magnetic resonance, liquid chromatography and/or mass spectrometry. Its unique focus on small molecules and the physiological effects of small molecules aligns the field of metabolomics very closely with the aims and interests of many researchers in the pharmaceutical industry. Because of its conceptual and technical overlap with many aspects of pharmaceutical research, metabolomics is now finding applications that span almost the full length of the drug discovery and development pipeline, from lead compound discovery to post-approval drug surveillance. This review explores some of the most interesting or significant applications of metabolomics as they relate to pharmaceutical research and development. Specific examples are given that show how metabolomics can be used to facilitate lead compound discovery, to improve biomarker identification (for monitoring disease status and drug efficacy) and to monitor drug metabolism and toxicity. Other applications are also discussed, including the use of metabolomics to facilitate clinical trial testing and to improve post-approval drug monitoring. These examples show that metabolomics potentially offer drug researchers and drug regulators an effective, inexpensive route to addressing many of the riskier or more expensive issues associated with the discovery, development and monitoring of drug products.[19]

Metabolomics and Global Systems Biology

''Systems biology'' is a term that has a relatively recent origin and currently means many different things to different investigators. The ideas encompassing the term systems biology have arisen as a result of the development of the ''omics'' technologies such as genomics, proteomics or metabonomics/ metabolomics.

In these fields of study large amounts of quantitative (or semi-quantitative) data are being derived, at a variety of levels of bio-molecular organization, from genes through proteins down to metabolites. One of the expectations of systems biologists is that, in some way, such data can be integrated to give a holistic picture of the state of the ''system'' that provides insights that are not available by other, more directed, methods, ultimately enabling a more fundamental understanding of biology to be obtained via networks of interactions at the molecular level.

This may, or may not, be a realistic ambition but, successful or not, such work may greatly aid in efforts to deliver the ''Personalized Healthcare Solutions'' so desired by the practitioners of 21st century medicine. Such therapeutic approaches, tailored to the exact biology (or biological state) of an individual, clearly require methods of patient evaluation that enable the clinician to select

the most appropriate combinations of drugs, dosages and treatment regimens before commencing therapy. In an ideal world this process would maximize therapeutic benefit and minimize adverse drug events. Attempts at this type of sub-classification of individuals (patient stratification) are beginning to be performed and are currently most often attempted using some particular genetic feature.

Moving away from disease, such concepts could easily be extended to more general lifestyle paradigms aimed at minimizing the propensity of an individual, found to have gene-level risk factors, to acquire a disease later in life by optimizing lifestyle (nutrition and exercise, etc.). Given the current cost of providing such detailed information on an individual it is difficult to believe that personalized medicine will be delivered via a systems biology approach in the near future (at least to large populations). However, this does not mean that systems approaches may not be valuable in identifying better diagnostics and, paradoxically, many of the insights that will illuminate ''personalized medicine'' may well come from omics-based epidemiological studies of populations. If it is taken as a given that the state of any biological system, be it cell, organ or whole organism, is a function of a combination of factors such as genotype, physiological state (e.g. age), disease state, nutritional state, environment (both current and historical), etc. The complexity faced by such investigations is clearly enormous. It is arguable that metabonomics, because it measures the outputs of the system rather than potential outcomes, offers the most practical approach to measuring global system activity via accessing the metabolic profiles that are determined by these combinations of genetic and environmental factors. This set of assumptions provides the basis for the discussion of the use of global metabolic profiling in systems approaches. [1, 2]

Metabolomics in pharmaceutical research and development: metabolites, mechanisms and pathways.

In recent years, quantitative metabolomics has played increasingly important roles in pharmaceutical research and development. Metabolic profiling of biofluids and tissues can provide a panoramic view of abundance changes in endogenous metabolites to complement transcriptomics and proteomics in monitoring cellular responses to perturbations such as diseases and drug treatments. Precise identification and accurate quantification of metabolites facilitate downstream pathway and network analysis using software tools for the discovery of clinically accessible and minimally invasive biomarkers of drug efficacy and toxicity. Metabolite abundance profiles are also indicative of biochemical phenotypes, which can be used to identify novel quantitative trait loci in genome-wide association studies. This review summarizes recent experimental and computational efforts to improve the metabolomics technology as well as progress towards in-depth integration of metabolomics with other disparate 'omics datasets to build mechanistic models in the form of detailed and testable hypotheses.[20]

Cancer Metabolomics

It is well known that significant metabolic change take place as cells are transformed from normal to malignant. A significant role in cancer initiation and progression is attributed to changes in RNA and protein expression levels and regulation. However, changes in small molecules also provide important mechanistic insights into cancer development.

There is a strong body of evidence supporting the important role of metabolic regulation in cancer. Malignant cells undergo significant changes in metabolism including a redistribution of metabolic networks. These metabolic changes result in different metabolic landscapes in cancer cells versus normal cells.

Metabolomics, as a global approach, is especially useful in identifying overall metabolic changes associated with a particular biological process and finding the most affected metabolic networks. Moreover, metabolomics provides an additional layer of information that can be linked with transcriptomics and proteomics data to obtain a comprehensive view of a biological system. Metabolomics is a relatively new field in genomics research but it is gaining broader recognition in the cancer community.

Most cancer metabolomics studies to date have been done using metabolic fingerprinting or profiling with NMR spectroscopy of tissue extracts or in vivo magnetic resonance spectroscopy. Using NMR spectroscopy techniques it is possible to differentiate several tumor types in humans and in animal models. But while techniques based on magnetic resonance have the advantage of being non-invasive, they have low sensitivity and cannot detect molecules at low concentrations. Mass spectrometry methods provide advantage of higher sensitivity and are more appropriate for in vitro studies similar to transcriptomics and proteomics, metabolomics generates large amounts of data. Metabolomics experiments generate a large volume of specialized data that are complex and multi-dimensional. Storing, organizing and retrieving the data and associated metadata requires properly designed databases. The analysis of these data sets is equally challenging and new analysis algorithms are still being developed.

Multivariate statistical analysis of the metabolomics data in many cases utilizes the same approaches as the analysis of other genomic data. However, metabolomics has unique bioinformatics needs in addition to others common in microarray or proteomics data due to the fact that it is generated by multiple analytical platforms and requires extensive data pre-processing. Major areas where developments in data analysis techniques are crucial for further progress of metabolomics include: data and information management, raw analytical data processing, metabolomics standards and ontology, statistical analysis and data mining, data integration, and mathematical modeling of metabolic networks within the framework of systems

biology. [21]

Plant Metabolomics

Plants are of pivotal importance to sustain life on Earth because they supply oxygen, food, energy, medicines, industrial materials and many valuable metabolites. Plant metabolomics is a huge analytical challenge as despite typical plant genomes containing 20,000-50,000 genes there are currently estimated 50,000 identified metabolites with this number set to rise to 200,000 [22]. These plants metabolites are synthesized and accumulated by the networks of proteins encoded in the genome of each plant. Due to its possibility off making economical worthwhile discoveries, plants have been the subject of many metabolomics research programs. It has been applied in plant biology by analysis of differences between plant species, genotypes or ecotypes [23]. It helps us to gain insight in the cellular regulation of plant biosynthetic network and to link changes in metabolite levels to differences in gene expression and protein production.

One of the first applications of the approach was to genotype Arabidopsis thaliana leaf extracts. However, even after the completion of the genome sequencing of Arabidopsis [24] and rice [25] function of these genes and networks of gene-to-metabolite are largely unknown. To reveal the function of genes involved in metabolic processes and gene-to-metabolite analysis is shown to be an innovative way for targeted metabolite analysis is shown to be an innovative way for identification of gene function for specific product accumulation in plants [26], [27]. Metabolomics can provide research a new tool to identify the functions of unknown genes in Arabidopsis and other plants. Understanding plant metabolism could lead to the engineering of the higher quality food or material producing plants.

Metabolomic Software And Servers:

Seven Golden Rules Software This software can be used to calculate molecular formulas from high resolution mass spectrometry data. It is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds.

SetupX - SetupX, developed by the Fiehn laboratory at UC Davis, is a web-based metabolomics LIMS. It is XML compatible and built around a relational database management core. It is particularly oriented towards the capture and display of GC-MS metabolomic data through its metabolic annotation database called BinBase.

XCMS(2) - XCMS2 is an open source software package which has been developed to automatically search tandem mass spectrometry (MS/MS) data against high quality experimental MS/MS data from known metabolites contained in a reference library (METLIN). Scoring of hits is based on a "shared peak count" method that identifies masses of fragment ions shared between the analytical and reference MS/MS spectra. Another functional component of XCMS(2) is the capability of providing structural information for unknown metabolites, which are not in the METLIN database. This "similarity search" algorithm has been developed to detect possible structural motifs in the unknown metabolite which may produce characteristic fragment ions and neutral losses to related reference compounds contained in METLIN, even if the precursor masses are not the same.

Peak Alignment Software - This page contains links and brief synopses of ore than 30 different spectral alignment tools that can be used to align, bin or compare multiple GC-MS, LC-MS, LC and NMR data sets.

MS-based Structure Elucidation Software - This page provides links to both commercial and non-commercial software suppliers that produce software for small molecule structure elucidation or MS data manipulation.

MetaboMiner - MetaboMiner is a Java based software package that can be used to automatically or semi-automatically identify metabolites in complex biofluids from 2D NMR spectra. MetaboMiner is able to handle both 1H-1H total correlation spectroscopy (TOCSY) and 1H-13C heteronuclear single quantum correlation (HSQC) data. It identifies compounds by comparing 2D spectral patterns in the NMR spectrum of the biofluid mixture with specially constructed libraries containing reference spectra of ~500 pure compounds.

FiD - FiD (Fragment iDentificator) is a software tool for the structural identification of product ions produced with tandem mass spectrometric measurement of low molecular weight organic compounds. FiD conducts a combinatorial search over all possible fragmentation paths and outputs a ranked list of alternative structures. This gives the user an advantage in situations where the MS/MS data of compounds with less well-known fragmentation mechanisms are processed. The software has an easy-to-use graphical interface with built-in visualization capabilities for structures of product ions and fragmentation pathways.

PolySearch - PolySearch is a text mining software tool that supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is 'Given X, find all Y's' where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences.

OpenMS - OpenMS is a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis.

BioSpider - BioSpider is essentially an automated report generator designed specifically to tabulate and summarize data on biomolecules - both large and small. Specifically, BioSpider allows users to type in almost any kind of biological or chemical identifier (protein/gene name, sequence, accession number, chemical name, brand name, SMILES string, InCHI string, CAS number, etc.) and it returns an in-depth synoptic report (approximately 3-30 pages in length) about that biomolecule and any other biomolecule it may target. This summary includes physico-chemical parameters, images, models, data files, descriptions and predictions concerning the query molecule.

COLMAR - COLMAR query is a webserver for identifying metabolites by NMR from complex metabolite mixtures. The COLMAR web-suite screens NMR chemical shift lists or raw 1D NMR cross sections taken from covariance total correlation spectroscopy (TOCSY) spectra or other multidimensional NMR spectra against an NMR spectral database. Cross peaks are selected using local clustering to avoid ambiguities between chemical shifts and scalar J-coupling effects. With the use of three different algorithms, the corresponding chemical shift list is then screened against chemical shift lists extracted from 1D spectra of a NMR database. The resulting query scores produced by forward assignment, reverse assignment, and bipartite weighted-matching algorithms are combined into a consensus score, which provides a robust means for identifying the correct compound.

HORA - The HORA suite (Human blOod Range vAlidator) consists of a Java application used to validate the metabolomic analysis of human blood against a database that stores the normal plasma and serum range concentrations of metabolites. The goal of HORA is to find the metabolites that are outside the normal range and to show those not present in the list provided by the user, for different thresholds of concentration. Moreover it supplies a graphical interface to manage the data. The software can also be used to compare different metabolomic techniques.

MeltDB - MeltDB is a web-based software platform for the analysis and annotation of datasets from metabolomics experiments. MeltDB supports open file formats (netCDF, mzXML, mzDATA) and facilitates the integration and evaluation of existing preprocessing methods. The system provides researchers with means to consistently describe and store their experimental datasets. Comprehensive analysis and visualization features of metabolomics datasets are offered to the community through a web-based user interface.

MetaboAnalyst - MetaboloAnalyst is a web-based metabolomic data processing tool that accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis (such as fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering along with a number of more sophisticated statistical or machine learning methods), graphing, metabolite identification and pathway mapping. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses.[1,2]


Comprehensive Metabolomic Databases

HMDB - The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found (and experimentally verified) in the human body. The database contains three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data. HMDB contains information on more than 6500 metabolites. Additionally, approximately 1500 protein (and DNA) sequences are linked to these metabolite entries. Each MetaboCard entry contains more than 100 data fields with 2/3 of the information being devoted to chemical/clinical data and the other 1/3 devoted to enzymatic or biochemical data. Many data fields are hyperlinked to other databases (KEGG, PubChem, MetaCyc, ChEBI, PDB, Swiss-Prot, and GenBank) and a variety of structure and pathway viewing applets.

BiGG - The BiGG database is a metabolic reconstruction of human metabolism designed for systems biology simulation and metabolic flux balance modeling. It is a comprehensive literature-based genome-scale metabolic reconstruction that accounts for the functions of 1,496 ORFs, 2,004 proteins, 2,766 metabolites, and 3,311 metabolic and transport reactions. It was assembled from build 35 of the human genome.

SetupX - SetupX, developed by the Fiehn laboratory at UC Davis, is a web-based metabolomics LIMS. It is XML compatible and built around a relational database management core. It is particularly oriented towards the capture and display of GC-MS metabolomic data through its metabolic annotation database called BinBase.

BinBase - BinBase is a GC-TOF metabolomic database.

SYSTOMONAS - SYSTOMONAS (SYSTems biology of pseudOMONAS) is a database for systems biology studies of Pseudomonas species. It contains extensive transcriptomic, proteomic and metabolomic data as well as metabolic reconstructions of this pathogen. Reconstruction of metabolic networks in SYSTOMONAS was achieved via comparative genomics. Broad data integration with well established databases BRENDA, KEGG and PRODORIC is also maintained. Several tools for the analysis of stored data and for the visualization of the corresponding results are provided, enabling a quick understanding of metabolic pathways, genomic arrangements or promoter structures of interest.

Metabolic Pathway Databases

KEGG - KEGG (Kyoto Encyclopedia of Genes and Genomes) is one of the most complete and widely used databases containing metabolic pathways (372 reference pathwasy) from a wide variety of organisms (>700). These pathways are hyperlinked to metabolite and protein/enzyme information. Currently KEGG has >15,000 compounds (from animals, plants and bacteria), 7742 drugs (including different salt forms and drug carriers) and nearly 11,000 glycan structures.

MetaCyc - MetaCyc is a database of nonredundant, experimentally elucidated metabolic pathways. MetaCyc contains more than 1,100 pathways from more than 1,500 different organisms. MetaCyc is curated from the scientific experimental literature and contains pathways involved in both primary and secondary metabolism, as well as associated compounds, enzymes, and genes.

HumanCyc - HumanCyc is a bioinformatics database that describes the human metabolic pathways and the human genome. The current version of HumanCyc was constructed using Build 31 of the human genome. The resulting pathway/genome database (PGDB) includes information on 28,783 genes, their products and the metabolic reactions and pathways they catalyze.

BioCyc - BioCyc is a collection of 371 Pathway/Genome Databases. Each database in the BioCyc collection describes the genome and metabolic pathways of a single organism. The databases within the BioCyc collection are organized into tiers according to the amount of manual review and updating they have received. Tier 1 DBs have been created through intensive manual efforts and include EcoCyc, MetaCyc and the BioCyc Open Compounds Database (BOCD). BOCD includes metabolites, enzyme activators, inhibitors, and cofactors derived from hundreds of organisms. Tier 2 and Tier 3 databases contain computationally predicted metabolic pathways, as well as predictions as to which genes code for missing enzymes in metabolic pathways, and predicted operons.

Reactome - Reactome is a curated, peer-reviewed knowledgbase of biological pathways, including metabolic pathways as well as protein trafficking and signaling pathways. Reactome includes several types of reactions in its pathway diagram collection including experimentally confirmed, manually inferred and electronically inferred reactions. Reactome has pathway data on more than 20 different organisms but the primary organism of interest is Homo sapiens. Reactome has data and pathway diagrams for >2700 proteins, 2800 reactions and 860 pathways for humans.

Compound or Compound-Specific Databases

PubChem - PubChem is a freely available database of chemical structures of small organic molecules and information on their biological activities. It contains structure, nomenclature and calculated physico-chemical data and is linked with NIH PubMed/Entrez information. PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool. PubChem has >19 million unique chemical structures.

ChEBI - Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds. The chemical entities in ChEBI are either products of nature (metabolites) or synthetic products used to intervene in the processes of living organisms (drugs or toxins). ChEBI contains structure and nomenclature information along with hyperlinks to many well-regarded databases. ChEBI uses a carefully developed ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are precisely specified. ChEBI has >15,500 chemical entities in its database.

ChemSpider - ChemSpider is an aggregated database of organic molecules containing more than 20 million compounds from many different providers. At present the database contains information from such diverse sources as a marine natural products database, ACD-Labs chemical databases, the EPA's DSSTox databases and from a series of chemical vendors. It has extensive search utilities and most compounds have a large number of calculated physico-chemical property values.

KEGG Glycan - The KEGG GLYCAN database is a collection of experimentally determined glycan structures. It contains all unique structures taken from CarbBank, structures entered from recent publications, and structures present in KEGG pathways. KEGG Glycan has >11,000 glycan structures from a large number of eukaryotic and prokaryotic sources.

Drug Databases

DrugBank - The DrugBank database is a blended bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs. DrugBank also contains extensive SNP-drug data that is useful for pharmacogenomics studies.

Therapeutic Target DB - The Therapeutic Target Database (TTD) is a drug database designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets. The database currently contains 1535 targets and 2107 drugs/ligands.

PharmGKB - The PharmGKB database is a central repository for genetic, genomic, molecular and cellular phenotype data and clinical information about people who have participated in pharmacogenomics research studies. The data includes, but is not limited to, clinical and basic pharmacokinetic and pharmacogenomic research in the cardiovascular, pulmonary, cancer, pathways, metabolic and transporter domains. Its aim is to aid researchers in understanding how genetic variation among individuals contributes to differences in reactions to drugs. PharmGKB contains searchable data on genes (>20,000), diseases (>3000), drugs (>2500) and pathways (53). It also has detailed information on 470 genetic variants (SNP data) affecting drug metabolism.

STITCH - STITCH ('search tool for interactions of chemicals') is a searchable database that integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug-target relationships. Text mining and chemical structure similarity is used to predict relations between chemicals. Each proposed interaction can be traced back to the original data sources. The database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes.

SuperTarget - SuperTarget is a database that contains a core dataset of about 7300 drug-target relations of which 4900 interactions have been subjected to a more extensive manual annotation effort. SuperTarget provides tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of 775 more extensively annotated drugs is provided separately through the Matador database (Manually Annotated Targets And Drugs Online Resource - http://matador.embl.de).

Spectral Databases

HMDB - The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. It contains experimental MS/MS data for 800 compounds, experimental 1H and 13C NMR data (and assignments) for 790 compounds and GC/MS spectral and retention index data for 260 compounds. Additionally, predicted 1H and 13C NMR spectra have been generated for 3100 compounds. All spectral databases are downloadable and searchable.

BMRB - The BioMagResBank (BMRB) is the central repository for experimental NMR spectral data, primarily for macromolecules. The BMRB also contains a recently established subsection for metabolite data. The current metabolomics database contains structures, structure viewing applets, nomenclature data, extensive 1D and 2D spectral peak lists (from 1D, TOCSY, DEPT, HSQC experiments), raw spectra and FIDs for nearly 500 molecules. The data is both searchable and downloadable.

MMCD - The Madison Metabolomics Consortium Database (MMCD) is a database on small molecules of biological interest gathered from electronic databases and the scientific literature. It contains approximately 10,000 metabolite entries and experimental spectral data on about 500 compounds. Each metabolite entry in the MMCD is supported by information in an average of 50 separate data fields, which provide the chemical formula, names and synonyms, structure, physical and chemical properties, NMR and MS data on pure compounds under defined conditions where available, NMR chemical shifts determined by empirical and/or theoretical approaches, information on the presence of the metabolite in different biological species, and extensive links to images, references, and other public databases.

MassBank - MassBank is a mass spectral database of experimentally acquired high resolution MS spectra of metabolites. Maintained and supported by he JST-BIRD project, it offers various query methods for standard spectra obtained from Keio University, RIKEN PSC, and other Japanese research institutions. It is officially sanctioned bythe Mass Spectrometry Society of Japan. The database has very detailed MS data and excellent spectral/structure searching utilities. More than 13,000 spectra from 1900 different compounds are available.

Golm Metabolome Database - The Golm Metabolome Database provides public access to custom GC/MS libraries which are stored as Mass Spectral (MS) and Retention Time Index (RI) Libraries (MSRI). These libraries of mass spectral and retention time indices can be used with the NIST/AMDIS software to identify metabolites according their spectral tags and RI's. The libraries are both searchable and downloadable and have been carefully collected under defined conditions on several types of GC/MS instruments (quadrupole and TOF).

Metlin - The METLIN Metabolite Database is a repository for mass spectral metabolite data. All metabolites are neutral or free acids. It is a collaborative effort between the Siuzdak and Abagyan groups and Center for Mass Spectrometry at The Scripps Research Institute. METLIN is searchable by compound name, mass, formula or structure. It contains 15,000 structures, including more than 8000 di and tripeptides. METLIN contains MS/MS, LC/MS and FTMS data that can be searched by peak lists, mass range, biological source or disease.

Fiehn GC-MS Database - This library contains data on 713 compounds (name, structure, CAS ID, other links) for which GC/MS data (spectra and retention indices) have been collected by the Fiehn laboratory. A locally maintain program called BinBase/Bellerophon filters input GC/MS spectra and uses the spectral library to identify compounds. The actual GC/MS library is available from several different GC/MS vendors.

Disease & Physiology Databases

OMIM - Online Mendelian Inheritance in Man (OMIM) is a comprehensive compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain many links to other genetics resources. OMIM contains 379 diseases with associated gene sequence data as well as 2385 conditions with a disease phenotype and a known genetic cause.

METAGENE - METAGENE is a knowledgebase for inborn errors of metabolism providing information about the disease, genetic cause, treatment and the characteristic metabolite concentrations or clinical tests that may be used to diagnose or monitor the condition. It has data on 431 genetic diseases.

OMMBID - OMMBID or the On-Line Metabolic and Molecular Basis to Inherited Disease is an web-accessible book/encyclopedia describing the genetics, metabolism, diagnosis and treatment of hundreds of metabolic disorders contributed from hundreds of experts. It also contains extensive reviews, detailed pathways, chemical structures, physiological data and tables that are particularly useful for clinical biochemists. Most university libraries have subscriptions to this resource. OMMBID was originally developed by Charles Scriver at McGill.[1,2] http://www.metabolomicssociety.org/database.html

Techniques coupled with metabolomics

Metabolites are chemical entities and be can be analyzed by standard tools of chemical analysis much molecular spectroscopy and MS. For better resolution, sensitivity and selectivity, these technologies can be hyphenated. Type of sample decides the use of different technologies and strategies [28]. It is not yet technically possible, and will probably require a platform of complementary technologies, because no single technique is comprehensive, selective, and sensitive enough to measure them all [29].

The primary drive in Metabolomics is to improve analytical techniques to provide an ever-increasing coverage of the complete metabolome of an organism. The most common and mature technique used is GC-MS analysis. It is a hyphenated system where GC first separates volatile and thermally stable compounds and then eluting compounds are detected traditionally by EI-MS. In metabolomics GC has been described as GOLD STANDARD [29]: in spite of its biasedness against non-volatile, high MW metabolites. Thermo-labile and large metabolites such as organic bis-triphosphates, sugar, nucleotide or intact membrane lipids cannot be detected by GC-MS.

Non-volatile polar metabolites often need to be derivatised by converting carbonyl group to oximes with O-alkyl hydroxylamine solution, followed by formation of TMS ester with slightly reagents (typically N-methyl-N-(trimethylsilyl trifluoroacetamide) to replace exchangeable protons with TMS groups. Oxime formation is required to eliminate undesirable slow and reversible slow and reversible silylating reaction with carbonyl groups, whose products can be thermally labile. The presence of water can result in breakdown of TMS esters, although extensive sample drying and presence of exceeds silylating reagents can limit the process. Small

aliquots of derivatised samples are analyzed by split and split-less technique on GC columns of differing polarity, which provides both high chromatographic resolution of compounds and high sensitivity. Deconvulation is then needed to quantify metabolites that are unresolved by GC. It can detect co-eluting peaks with peak, apexes separated by less than 1s and also detect low-absorbance peaks co-eluting in presence of metabolites at much higher concentration. Using gas chromatography-mass spectrometry (GC-MS) [30] comprehensive metabolite profiling of potato (Solanum tuberosum) tuber detected 150 compounds, out of which 77 could be chemically identified as amino acids, organic acids or sugars, and 27 saponins in Medicago truncatula were identified [2]. 326 distinct compounds were identified in A. thaliana leaf extracts, further elucidating the chemical structure of half of these compounds. Different compound classes have

been investigated using fractionation techniques and about 100 compounds were identified in rice grains via fractionation techniques by employing GC-MS [30]. In GC-MS recent advances with respect to fast acquisition as well as accurate mass determinations have been achieved by applying time-of-flight technology (TOF) [31]. Improved Deconvulation algorithms and faster spectral acquisition by TOF measurement [32] have however resulted in detection of over 1000 components from plant leaf extracts at a throughput of over 1000 sample per month. Recent advance is MSFACTs (Metabolomics Spectral Formatting and Conversion Tools) [33] which comprises of two tools, one for alignment of integrated chromatographic peak list and another for extracting information from raw chromatic ASC II formatted data files.

Another recent advance is MET- IDEA (Metabolomics Ion-based Data Extraction Algorithm) which is capable of rapidly extracting semi-quantitative data from raw data files, which allows for more rapid biological insight [33].Over 300 metabolites were covered in a proof-of-concept study on functional genomics in Arabidopsis, using GC-MS technology. Although, it has been shown that the number of detected peaks in typical GC-MS plant chromatogram can be multiplied by deconvolution algorithm, the de novo identification of GC-MS peaks remains cumbersome.

Therefore, needs for development of the complementary technique allowing plant sample analysis without chemical modification and providing enhanced qualitative characterization of the components are clear.

In addition to MS based approaches, nuclear magnetic resonance (NMR) is also being used in metabolomic analysis [35], [36]. NMR has low sensitivity than MS and suffers from overlapping signals, leading to smaller numbers of absolute identifications, but still it is used in metabolomics study as it is non-destructive, and spectra can be recorded from cell suspensions, tissues, and even whole plants, as well as from extracts and purified metabolites [37], [38]. It offers an array of detection schemes that can be tailored to the nature of the sample and the metabolic problem that is being addressed [38]. Thus analyzing the metabolite composition of a tissue extract, determining the structure of a novel metabolite, demonstrating the existence of a particular metabolic pathway in vivo, and localizing the distribution of a metabolite in a tissue are all possible by NMR. However, the nature of the NMR measurements that are required for these tasks, particularly in relation to the hardware requirements, the detection scheme, and the sensitivity of the analysis is very different [38]. Third, the natural abundance of some of the biologically relevant magnetic isotopes is low and this allows these isotopes, particularly 2H, 13C, and 15N, to be introduced into a metabolic system as labels prior to the NMR analysis [39], [40].

Hyphenating NMR with liquid chromatography can increase its efficiency by reducing the co-resonant peaks and improving dynamic range. It has been reported that a combination of HPLC-NMR spectroscopy with rudimentary data analysis has been employed for the evaluation of metabolic changes in transgenic food crops [38]. Using LC-NMR nearly 2700 analytes were detected in plant extracts [41]. Directly coupled HPLC-NMR and HPLC-NMR-MS has been used that allows rapid identification of metabolites with little sample preparation [42], [43].


Metabolomics is an emerging technology that has lot of scope and needs lots of efforts to improve the sensitivity of metabolomic experiments. Targeted approaches are need that can focus on the specific classes of small molecules so that remarkable sensitivity can be achieved. Efforts should be made to develop fractionation and enrichment methods for specific classes of aqueous metabolites should prove particularly valuable. As compared to genomics and proteomics, major problem faced by metabolomics is the determination of metabolite structures as they constitute a family of biomolecules of near limitless structural diversity unlike genes and proteins.

Increased sensitivity and high resolution tools combined with the exhaustive searchable databases that contain all biochemical information of all known metabolites should facilitate the future characterization of metabolites. Just increase in the number of instruments like NMR, MS, IR or any other technique will not solve this problem, instead new technologies are needed and real jump in innovation or even more important- better software technologies and curated and unified open access database are needed.

Metabolomics is emerging as a powerful high throughput platform complementing other genomics platform like transcriptomics and proteomics. Combination of these high throughput data generation techniques with mathematical modeling of biochemical and signaling network is essential; for the systems biology and will help us to deeper understand how biological systems work as a whole.[1,2]