Human genetic variation and mutations

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The Human Genome

In order to be able to understand the biological importance of the genetic information in health and disease, we first needed to know the entire nucleotide sequence of the human genome (Vogel &Motulsky). The human genome comprises two parts: a complex nuclear genome and a very simple mitochondrial genome. The DNA sequence of the human mitochondrial genome was published in 1981, and a detailed understanding of how mitochondrial DNA (mtDNA) works has been built up since then. It consists of a single type of circular double-stranded DNA that is 16.6 kilobases in length, containing 37 genes. Cells typically contain thousands of copies of the double-stranded mtDNA molecule, but the number can vary considerably in different cell types. (Strachan) The more complex nuclear genome has been a much more formidable challenge (Strachan). Thus an international collaborative project has been undertaken named ‘’The Human Genome Project’’(HGP) to determine the nucleotide sequence of the human genome. The project was initiated on 1 October 1990 and was essentially completed in 2004. The potential medical benefits from the knowledge of the human genome sequence were the major rationale behind the funding of this international project. The goals of the Human Genome Project were to: determine the linkage map of the human genome; construct a physical map of the genome by means of cloning all fragments and arrange them in the correct order; determine the nucleotide sequence of the genome; and provide an initial exploration of the variation among human genomes (Vogel & Motulsky).The publicly funded HGP met with aggressive competition from privately funded genome sequencing. In 1999 the Celera company announced that it intended to produce a draft human genome sequence in 2 years (Strachan). The involvement and contributions of the biotechnology company Celera may have provided the necessary competition for the timely completion of the project (Vogel & Motulsky).

Once the initial draft genome sequence had been obtained, another major international effort focused on functionally important sequences with a priority of identifying and characterizing all human genes and regulatory elements. By 2004 more than 21,000 human genes were reported to have been validated by determining full-length cDNA sequences. However, there are still considerable uncertainties about exactly how many genes we have (Strachan). Today, with a total length of 3,384,269,757 base pairs (Genome Reference Consortium GRCh38.p2), the human genome complexity makes the determination of the genetic basis for health and disease an immense challenge for scientists.

Genetic Disorders

A disease is the result of the combined action of genes and environment, but the relative role of the genetic component may be large or small. Among disorders caused wholly or partly by genetic factors, three main types are recognized: chromosome disorders, single-gene disorders, and multifactorial disorders (Thompson&Thompson).

The worldwide estimated number for the birth prevalence of infants with serious congenital disorders is 53 out of 1000 live births. From these, 3.9 out of 1000 are chromosomal disorders, 16.8 out of 1000 single-gene disorders and the rest, multifactorial disorders or of unknown origin (Vogel&Motulsky).


Table 2.1: Overall global figures for the frequency of genetic disease and congenital malformation, by WHO.

Thence, one of the principal aims of modern medical genetics is to characterize mutations that lead to genetic disease, to understand how these mutations affect health, and to use that information to improve diagnosis and management (Thompson&Thompson).

During the ensuing 100 years, medical genetics grew from a small subspecialty concerned with a few rare hereditary disorders to a recognized medical specialty whose concepts and approaches are important components of the diagnosis and management of many disorders, both common and rare (Thompson&Thompson). DNA techniques as well as association and linkage studies have greatly contributed to the identification of causal genetic factors of human disorders. Although have done big contributions, a lot of questions remained unanswered (Motulsky,2010b).

Advances in our understanding of molecular genetics have been driven by the development of technologies that permit the detailed analysis of both normal and abnormal genes and the expression of thousands of genes in normal and disease states. The application of these techniques has increased the understanding of molecular processes at all levels, from the gene to the whole organism (Thompson&Thompson). Exceptionally, the recent developments in DNA sequencing technologies have revolutionized in the last years the human genetics and genomics in healthy and disease (Motulsky, 2010b). No doubt, over the coming decades, genetic disease (and indeed healthy human function) will be much better understood as a result of genetic research (Griffiths).

Human genetic variation and mutations

The sequence of nuclear DNA is nearly 99.9% identical between any two humans. Yet it is precisely the small fraction of DNA sequence different among individuals that is responsible for the genetically determined variability among humans (Thompson&Thompson). This genetic variability is the molecular substrate of the evolutionary process (Vogel&Motulsky). The genetic variation of the human genome can be found in the form of single nucleotide variants (SNVs), segmental duplications, low-copy repeats, indels (insertions, deletions), inversions and copy number variants(CNVs) (Vogel&Motulsky,Antonarakis, 2010). When a variant is so common that it is found in more than 1% of chromosomes in the general population, the variant constitutes what is known as a genetic polymorphism. In contrast, alleles with frequencies of less than 1% are, by convention, called rare variants. Although many deleterious mutations that lead to genetic disease are rare variants, there is not a simple correlation between allele frequency and the effect of the allele on health. Many rare variants appear to have no deleterious effect, whereas some variants common enough to be polymorphisms are known to predispose to serious illness (Thompson&Thompson). A major advance in human population genetics over the past decade was the creation of a genome-wide haplotype map, or HapMap. A consortium of scientists around the world genotyped thousands of people representing the diversity of our species for hundreds of thousands of SNPs and microsatellites. The result is a highly detailed picture of variation in our species. The data are available to the public at several Web sites, including that of the International HapMap Project ( and the Human Genome Diversity Project ( (Griffiths). Mutations are the ultimate source of all genetic variation (Griffiths). A mutation is defined as any change in the nucleotide sequence or arrangement of DNA. Mutations can be classified into three categories (Table 9-1): mutations that affect the number of chromosomes in the cell (genome mutations), mutations that alter the structure of individual chromosomes (chromosome mutations), and mutations that alter individual genes (gene mutations).Genome mutations are alterations in the number of intact chromosomes (called aneuploidy) arising from errors in chromosome segregation during meiosis or mitosis. Chromosome mutations are changes involving only a part of a chromosome, such as partial duplications or triplications, deletions, inversions, and translocations, which can occur spontaneously or may result from abnormal segregation of translocated chromosomes during meiosis. Gene mutations are changes in DNA sequence of the nuclear or mitochondrial genomes, ranging from a change in as little as a single nucleotide to changes that may affect many millions of base pairs. They can originate by either of two basic mechanisms: errors introduced during the normal process of DNA replication, or mutations arising from a failure to repair DNA after damage and to return its sequence to what it was before the damage. Human gene mutation is a highly sequence-specific process, irrespective of the type of lesion involved (Cooper). All three types of mutation occur at appreciable frequencies in many different cells. If a mutation occurs in the DNA of cells that will populate the germline, the mutation may be passed on to future generations. In contrast, somatic mutations occur by chance in only a subset of cells in certain tissues and result in somatic mosaicism as seen, for example, in many instances of cancer. Somatic mutations cannot be transmitted to the next generation. Some mutations are spontaneous, whereas others are induced by physical or chemical agents called mutagens, because they greatly enhance the frequency of mutations (Thompson&Thompson).


The description of different mutations not only increases awareness of human genetic diversity and of the fragility of human genetic heritage but also, more significantly, contributes information needed for the detection and screening of genetic disease in particular families at risk as well as for some diseases in the population at large. (Thompson&Thompson)

Gene mutations and Pathogenic Variants

As it was mentioned above, mutations, and especially gene mutations, are responsible for genetic variation and they are the source of evolutionary changes (Olson-Manning). The great majority of these are completely harmless and have no known effect on the phenotype. Even most of those that do affect the phenotype are part of the normal variation that makes us all individual. Special interest, however, naturally attaches to those variants that are pathogenic and they either make us ill or make us susceptible to an illness. However, some variants may be pathogenic only at times of environmental stress, and others may have subtle effects that manifest as susceptibility to a disease, perhaps only when in combination with certain other genetic variants (Strachan).

Pathogenic changes are often caused by small-scale DNA sequence changes in either the coding sequence or the regulatory region of a gene (Strachan). The complexity of a gene and its possible transcripts suggests that the distinction between pathogenic and nonpathogenic mutation is often very difficult (Vogel&Motulsky, Speicher). Additionally, the question of the proportion of possible mutations within inherited human disease genes that are likely to be of pathological significance is very difficult to address because it is dependent not only upon the type and location of the mutation but also upon the functionality of the nucleotides involved (itself dependent in part upon the amino acid residues that they encode) which is often hard to assess (Cooper,Arbiza et al., 2006).

When a mutation affects the coding part of a gene, an incorrect transcription of the mRNA could be produced, resulting in an abnormal protein (Cooper et al., 2010). Mutations at the exome level that could have an effect on the amino acid sequence can be:

i) Point mutations that can be either silent (synonymous), when the base substituted does not change the amino acid, or missense (non-synonymous), when the single substitution changes the amino acid (Nussbaum et al., 2007).

ii) Nonsense mutations are point mutations that change an amino acid codon for a stop codon truncating the protein (Cooper et al., 2010).

iii) Insertions and deletions of a number of nucleotides not divisible by three, are the

ones that change the reading frame of the sequence resulting in a completely

different translation (frameshift mutations) but when divisible by three the

equivalent amino acid(s) are inserted or deleted (non-frameshift mutations)

(Antonarakis and Cooper, 2010).

iv) Mutations in splicing sites can affect the normal RNA splicing, altering the mature

RNA that could contain parts of introns and lack parts of exons (Antonarakis and

Cooper, 2010).

How can we decide whether a sequence change we have discovered in such a person is the cause of their disease or a harmless variant?

First we can ask whether the variant affects a sequence that is known to be functional. Such sequences would include the coding sequences of genes, sequences flanking exon-intron junctions (splice sites), the promoter sequence immediately upstream of a gene, and any other known regulatory sequences. The great majority of all known pathogenic variants affect sequences that were already known to be functional, and these comprise only a small percentage of our total DNA. However, it is always possible that a variant located outside any known functional sequence might lie in a currently unidentified functional element. The ENCODE project is revealing many previously unsuspected functional elements in the human genome. Such elements are suspected to be locations for variants that merely alter susceptibility to a disease, rather than directly causing any disease.(Strachan)

Then, If a variant does affect a known functional sequence, we must try to predict its effect. A table of the genetic code (Figure) can be used to identify the effect of a coding sequence variant on the protein product of a gene.

De novo mutations

Germline mutations arise anew during meiosis in every generation. Such spontaneously occurring genetic alterations are termed de novo mutations and serve to describe those heritable mutations that neither parent possessed or transmitted. Thus, de novo mutations denote mutations that arose in the gametes of the parents as distinct from post-zygotic somatic mutations that arise during embryonic development. As with inherited mutations, de novo mutations range in size from point mutations and small indels of multiple bases to much larger CNVs and structural rearrangements (Ku

Studies of de novo mutations in the human genome have been very challenging owing to past technological limitations. Although genome-wide de novo CNVs were previously studied using microarrays, the study of de novo point mutations or SNVs required the application of large scale sequencing, which was not feasible using Sanger sequencing. The advent of high through put next-generation sequencing (NGS) technologies has ushered in a new era in the study of de novo mutations. The availability of NGS has allowed first estimations of the human genome mutation rate as well as a glimpse of the spatial distribution of de novo point mutations in the human genome and their association with heritable diseases. Recent studies have estimated the per generation mutation rate in humans as approximately 108 (Ku et. al). De novo mutations represent the most extreme form of rare genetic variation: they are more deleterious, on average, than inherited variation because they have been subjected to less stringent evolutionary selection. This makes these mutations prime candidates for causing genetic diseases that occur sporadically (Veltman).

Although numerous de novo point mutations have now been identified which are responsible for rare disorders, it remains unclear what constitutes evidence for causality. It is both difficult and challenging to establish the causative role of newly identified mutations, even for de novo events (ku et al).

DNA sequencing evolution

Exome sequencing and strategies to detect pathogenic variants

Recent advances