Phenotype And Metabolism Of Human Genome Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In 1990, the human genome project was born to identify all the genes contained on it and determine the sequence of the 3 billion base pairs that make up human DNA[1]. The genomic data should then be stored in databases and analyzed to improve disease diagnosis and determine genetic susceptibility [2]. Many different approaches to extract useful information from the genomic sequence came after the publication of the first draft sequence of the human genome in 2001 [3]. These so-called post-genomic approaches included high-throughput groups of technologies in genomics, transcriptomics, proteomics and metabolomics that measure and analyze thousands of DNA sequences, RNA transcripts, proteins and metabolic fluxes in a single experiment. Thanks to these studies, it is possible nowadays to understand specific aspects of the disease process and develop clinical applications. Some of the diseases that have already benefited from these types of data are cardiovascular disease [4,5], obesity [6-8] or diabetes [9-11], amongst others.

After the appearance of these approaches and the high-throughput fever, the focus switched to knowledge-based studies that aim to decipher functional associations by combining several biological evidence types. This is due to the fact that the integration of information from multiple data types is seen as a more robust and accurate approach to unravel functional associations. Now the attention shifted from genes and proteins to biological systems, which means that enormous amounts of high-throughput experimental data from diverse sources become available and there is an urgent need thus for integration. Evolution of tools for large-scale function annotations of genes, multi-member gene families and networks is crucial, [12] is an example of such tools.


The distinction between phenotype and genotype is fundamental to the understanding of heredity and development of organisms. A genotype can be defined as an individual's collection of genes. When the information encoded in the genes is used to make proteins and RNA molecules we say that he genotype is expressed; the expression of the genotype contributes to the individual's observable traits, called the phenotype. The phenotype of an organism is the collection of observable traits, such as height, eye color, and blood type. Some traits are largely determined by the genotype, while other traits are largely determined by environmental factors [13].

Organisms are characterized by great variation from one to another. On the average, there are 3 million nucleotide differences between any two people taken at random and even very closely related individuals have many genetic differences. Only twins have identical genomes, but still many mutations occur during the process of growth and development of the cells that conform our body, which means that even the cells of the same individual do not contain identical genomes. Moreover, identical twins differ from each other due to environmental variations. This is why every human is different.

After this brief introduction on genotypes and genomes, some of the most interesting attempts of extracting useful information from the enormous amount of data provided by the human genome project in the last decade will be analyzed.

Single nucleotide polymorphisms (SNPs)

SNPs are variations in the DNA sequence that occur when only one nucleotide changes. An illustrative example of a SNP is the change of a sequence from TAGGCTCA to TTGGCTCA. The concept is old, before the human genome was sequenced it was used by yeast, worm and fly geneticists and it was known as nucleotide substitution. SNPs, which are the most common form of human genetic variation, occur once every 1200 base pairs (bp) in the human genome [14] and at least in 1% of the population [15], nucleotide substitutions at lower frequencies are not considered SNPs but mutations. This is due to the fact that SNPs variations have no negative effect on the organism. When a variation has a negative effect for the organisms that carries it, the intensity with which the environment tends to eliminate it from the population is high; this is why mutations that produce disease are present at lower frequencies on populations. The harmless effect of SNPs lowers the selective pressure thus raising the frequency in random populations. However, it is widely accepted that some SNPs could predispose people to disease or influence their response to a drug. This is why scientists sought statistically significant associations between one or a few SNPs and a certain phenotype like response to certain drugs or complex diseases such as hypertension, Alzheimer, or schizophrenia. At the same time, clinical pharmacologists also sought statistically significant associations between one or a few SNPs and a certain phenotype, like effects of a new drug on asthma, diabetes or heart disease, or a new drug on treating a type of cancer. Such publications were often followed by several reports refuting the original conclusion, as is stated in [16].

The HapMap

The International HapMap Project [17], completed between 2003 and 2006, is a joint effort to identify and catalog genetic similarities and differences in human beings. The information obtained from the HapMap, is used by researchers all around the world in experiments aiming to find genes that affect health, disease, or individual responses to medications and environmental factors. To make this information freely available to all scientists, all the data generated by the Project can be downloaded with minimal constraints through the HapMap webpage [18]. DNA samples studied in the first phase included samples from the 6 participant countries: Yoruba in Ibadan, Nigeria (YRI), Japanese in Tokyo, Japan (JPT), Han Chinese in Beijing, China (CHB), and Centre d'Etude du Polymorphisme Humain (CEPH) samples from Utah, having Caucasian ancestry from northern and western Europe (EU). When the International HapMap Project was completed, the researchers demonstrated that the 10 million SNPs described variants, clustered into local neighborhoods called haplotypes [19], and that they can be accurately sampled by as few as 300,000 carefully chosen SNPs. New technological systems allow these SNPs to be systematically studied in high-throughput facilities that dramatically lower the cost [20].


Another example of research using the human genome data has been the Encyclopedia of DNA Elements (ENCODE) Pilot Project that ran from 2004 to 2007. In this project, about 1%of the human genome (30 Mb) was carefully selected and studied in great detail by a worldwide consortium made up by several research groups with diverse backgrounds and expertise [21]. The idea was to map a large variety of sequences, genes (protein-coding and non-coding exons), promoters, enhancers and repressor/silencer sequences amongst others. The consortium produced more than 200 data sets, representing more than 400 million data points, 200 Mb of comparative sequences (e.g., human genome versus chimpanzee), and guidelines for rapid release of all data [22]. Some highlights of their discoveries are: extensive overlap of gene transcripts and many non-protein coding regions; complex networks of transcripts; many new transcription start-sites, with an arrangement of far more complex regulatory sequences and binding of transcription factors, as is explained in detail in [23]. The extremely elaborated findings produced by the ENCODE Project produced at that time a big confusion in the field, since it questioned previous concepts of "what constitutes a gene" [24]. Previously, a gene was defined as "A segment of DNA, including all regulatory sequences for that DNA, which encodes a functional product-whether it is a protein, small peptide, or one of the many classes of regulatory RNA." The proposed definition post-ENCODE, aiming to avoid complexities of regulation and transcription, changed to: "A union of genomic sequences encoding a coherent set of potentially overlapping functional products". The success of the pilot project was enough to collect new funding from NHGRI in September 2007 to scale the ENCODE Project to a production phase on the entire genome. In this phase of the project, the consortium continues with additional pilot-scale studies but also includes a Data Coordination Center and a Data Analysis Center to track, store and display ENCODE data and assist in integrated analyses of it.

Genome-Wide Association (GWA) Studies

A genome-wide association study (GWAS) is an approach used in genetics research to associate specific genetic variations with particular diseases. The method involves scanning genomes from many different people looking for genetic markers that can be used to predict the presence of a disease. Once genetic markers are identified, they can be used to understand how genes contribute to the disease and develop better prevention and treatment strategies. In genome-wide association studies, researchers compare the genomes of people with an illness (cases) to unaffected people (controls). Through this comparison, it becomes possible to identify the genetic differences between sick and healthy people, even when the genetic differences are subtle. It is likely that for common diseases, the genetic differences will individually have moderate impacts on a person's risk of suffer that disease. However, the combination of many slightly altered genes together with a risky environment may add up to a major risk for an individual. By identifying these genetic risks, researchers should be able to identify clues to new targets for the development of therapies that treat or even prevent illness [25].

Only after the HapMap Project catalogued millions of SNPs that contribute to common diseases and the development of high-throughput genotyping platforms, GWA scans of whole genomes were financially achievable. An early example of success was the discovery of a variant in the complement factor H gene that represents a major risk factor for age-related macular degeneration, a common cause of blindness in the elderly. This finding, which was made possible by a genome association study, raised the possibility of a whole new approach to preventing this devastating disease [26].

Phenotype and metabolism

Although the genetic information of an individual is an important component of its uniqueness, it accounts for only a portion of this variation. An individual's phenotype is achieved and maintained by every different metabolic activity of the cell and the complex interactions among genotype, metabolic phenotype, and the environment. High-throughput technologies producing millions of data from a single experiment have transformed studies from a reductionist concept into a holistic practice where many metabolic phenotypes and the genes involved in that metabolism, can be measured through functional genomics and metabolic profiling.

Metabolites are small molecule intermediates and products of metabolism. It is widely accepted that small changes in the activities of individual enzymes lead to small changes in metabolic fluxes, but can lead to large changes in metabolite concentrations [27]. Metabolomics is the discipline that studies metabolite composition and dynamics, as well as interactions among them or responses to changes in their environment; it is widely used in medical and nutritional systems biology [28, 29], where the metabolome is useful to link the genotype and the environment. Changes in metabolic composition are likely to be subtle in the early stages of any disease. Many key metabolites from different pathways have a role in disease development, and the ability to simultaneously detect and measure all these metabolites allows for a more global analysis of the state of the disease. This discipline is more than 40 years old, but in the beginning the knowledge and technologies available were very limited. Insufficient information existed to link metabolite measurements to the human genome or physiology. The key milestone in this context was again the publication of the human genome sequence [3] and the subsequent appearance of different omics approaches to extract useful information from it [30]. In addition, the invention of electrospray ionization (ESI) [31] finally allowed studies of intact molecules and facilitated coupling of liquid chromatography to mass spectrometry, which was a real revolution in the field.

Since GWAS became affordable, the most costly steps for the discovery of the genetic bases of disease have switched from genotyping to phenotyping. The discipline of phenomics, described by [32] as the systematic study of phenotypes on a genome-wide scale is still in an early phase of development. The data obtained in the analysis of human genomes reflects only one level of biological knowledge that may impose new constraints on the modeling of higher level phenotypes. The redefinitions of phenotypes should be guided not only by gene expression findings, but also by data produced using models of cellular systems and signalling pathways. As is suggested in [33] the human phenome project would keep biomedical scientists busy for the next century. Understanding the true dimensionality of the human genome and reduction of its complexity are the main focuses at the moment. But this problem is minimum in comparison to defining the dimensions of the human phenome.

The scientific problem behind the mapping of the human phenome is big and the solution is still unclear. It is obvious that, due to the amount of data that should be taken into account, only those strategies based in computational methods will be successful, and this is what is already happening. Nowadays bioinformaticians all over the world work in solutions aiming to cluster genes based in metabolism and signalling pathways data. The same collaborative approach should be used to describe phenomes and the pathways that connect genomic variation to phenotypes will be revealed. An extra modelling effort should be made to develop high quality models that link the knowledge derived from human genome analysis with the knowledge obtained from the phenomic data.

Privacy aware and personal health records

The beginning of the XXI century has been characterized by a rapid progression in the biomedical field thanks to the publication of the human genome sequence and the subsequent development of omics approaches coupled to the emergently new discipline of bioinformatics. A real revolution in medicine is predicted when all the data from genomics, metabolomics and phenomics will be combined to, not only give diagnosis of a disease that is already happening but also predict those that may come. This generation will witness the debut of personalized medicine, a concept that, during decades, has captured the imagination of physicians, politicians, and patients in general.

Numerous relevant publications and projects have been released in the last years: the first disease with a whole genome sequence, the acute myeloid leukemia genome [34], the initiation of the 1000 Genomes Project [35] aiming to obtain a detailed catalogue of human genetic variation and the International Human Microbiome Consortium, to study and understand the role of the human microbiome in the maintenance of health and causation of disease and to use that knowledge to improve the ability to prevent and treat disease [36][37]. Other collaborations are still in progress, such as the Copy Number Variation Project [38] and the Cancer Genome Atlas [39]. Furthermore, plenty of genome-wide association studies associates specific loci to a variety of diseases.

The personalized medicine of the future will develop new treatments combining data from the variations in the patient and the molecular bases of the disease itself. It will also help to identify sub-groups of patients for whom the different treatments will work best or groups of patients with higher chance to develop some diseases and, ideally, help to change their lifestyle or give them treatments to delay onset of a disease or reduce its impact. In the following decades the healthcare revolution will take place. At the biomedical level new diagnostic and prognostic tools will increase our ability to predict the outcomes of drug therapy, and the use of biomarkers - biological molecules that indicate a particular disease state - could result in more focused and targeted drug development. But personalized medicine also offers attractive possibilities for politicians and decision-makers since it has the potential to make healthcare more cost-effective.

But nowadays personalized medicine is an issue that still has to come, during that time there is much hope in the field. Many optimistic reviews continue to appear [40][41][42], but others are quite critical to the fact that it is at this moment impossible to assign a patient to an unequivocal phenotype and especially relate it to an unequivocal genotype [43] mostly due to the amount of new findings and studies that appear almost monthly and that will be increased in the future.

Aqui quizás puedes añadir algo de privacy aware and personal health records que es algo informático

As has been reviewed extensively in this part of the chapter, biomolecular research has experienced an enormous progression over the last decade from the completion of the human genome project to functional genomics. The application of this knowledge has greatly improved our understanding of health and disease. It is now clear that disease states cannot be explained only by genomic information, since it involves the interaction between our genome and the environment. This interaction, reflected in the phenotype, is starting to be understandable thanks to the different visions of the same problem captured by different post-genomic approaches. The logical step forward is to integrate all these visions into a high level model which can be at the same time informative and predictive.