0115 966 7955 Today's Opening Times 10:00 - 20:00 (BST)

Genome Sequencing Approaches

Published: Last Edited:

Disclaimer: This essay has been submitted by a student. This is not an example of the work written by our professional essay writers. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

1. Introduction

During the last 100 years, medical genetics grew from a simple human curiosity for a few rare hereditary disorders, to a recognized medical specialty with concepts and approaches that are key elements for the diagnosis and management of many disorders, both common and rare (Thompson & Thompson). Numerous DNA techniques in parallel with association and linkage studies have greatly contributed to the identification of causal genetic factors of human disorders. Notwithstanding, a lot of questions remained unanswered (Vogel & Motulsky, 2010b).

Development of technologies that permit comprehensive analysis of normal and abnormal genes and the expression of genes in both normal and disease states enhanced our ability to understand better the concept of molecular genetics at all levels, from the gene to the whole organism (Thompson & Thompson). Particularly, the recent developments in the field of DNA sequencing technologies have revolutionized the human genetics and genomics in healthy and disease (Motulsky, 2010b). No doubt, over the coming decades, genetic disease will be much better understood as a result of genetic research (Griffiths).

1.1 The Human Genome

In order to be able to understand the biological significance of genetic information in health and disease, we must first know the whole nucleotide sequence of the human genome (Vogel &Motulsky, Antonarakis). Nowadays, we know that the human genome comprises two distinct parts: a complex nuclear genome and a very simple mitochondrial genome.

The publication of the mitochondrial DNA sequence of the human genome in 1981, led to a detailed understanding of its importance and function. Mitochondrial DNA (mtDNA) consists of a type of circular double-stranded DNA, containing 37 genes that lie in a total area of 16.6 kilobases in length. The cells typically contain thousands of copies of the mtDNA molecule and their number can vary remarkably in different cell types (Strachan).

In contrast with the mtDNA, the decoding of the more complex nuclear genome has been a more challenging task (Strachan). Thus an international collaborative project, named ‘’The Human Genome Project’’(HGP), was initiated on 1 October 1990, having a definite purpose to determine the nucleotide sequence of the human genome. In 2004, the HGP was essentially completed (Vogel & Motulsky) and despite the high cost of this attempt, approximately US$ 3billion (www.genome.gov), the potential medical benefits from the knowledge of the human genome sequence were the major motive behind the funding of this international project. The four main objectives of the HGP were to: determine the nucleotide sequence of the genome; construct a physical map of the human genome; construct the linkage map of the human genome; and provide an initial inspection of the variation among human genomes (Vogel & Motulsky).

The huge interest about the publicly funded HGP met with combative competition from privately funded genome sequencing projects. In 1999, the Celera company announced that it contemplated to produce a draft sequence of the human genome in short period of two years (Strachan) and in fact this interference may have provided the necessary competition for the punctual completion of the HGP (Vogel & Motulsky). The two different projects have followed different strategic approaches: The HGP Group used hierarchical shotgun sequencing (figure 1A), whereas the Celera group used the whole-genome shotgun sequencing approach (figure 1B). In hierarchical shotgun sequencing approach, DNA libraries are mapped into a physical map on the chromosomes and then individual clones are selected for sequencing. The assembly of these sequences leads to the construction of the human genome (International Human Genome Sequencing Consortium, 2004). The whole-genome shotgun sequencing approach is also based on the construction of plasmid libraries followed by direct sequencing of the fragments and assembly in continuous sequences (Venter et al., 2001).


Figure 1: Genome sequencing approaches. A: In whole genome shotgun sequencing, the entire genome is sheared randomly into small fragments (appropriately sized for sequencing) and then reassembled. B: In hierarchical shotgun sequencing, the genome is first broken into larger segments.

In 2004 the completion of “The Human Genome Project” revealed the 99% of the euchromatic sequence of the human genome, to the high accuracy of 1 error in 100,000 nucleotides (Vogel & Motulsky, Antonarakis, 2010).

Once the initial draft human genome sequence had been obtained, international efforts have been focused on the identification and characterization of functionally important sequences, such genes and regulatory elements. Through 2004, approximately 21.000 genes have been fully characterized by determining the coding DNA sequences (CDS). Despite that, there are still considerable ambiguities about the exact number of the genes we have (Strachan).

Today, the human genome complexity, spread to a total length of 3,384,269,757 base pairs (Genome Reference Consortium GRCh38.p2), makes the determination of the genetic basis for health and diseases a massive challenge for scientists.

1.2 Genetic Disorders

A disease can be characterized as the interaction between genes and environment. However, the relative role of the genetic factor may be small or large. Therefore a disorder is caused wholly or partly by genetic factors. Traditionally, genetic disorders are classified in three categories: chromosome disorders, single-gene disorders, and multifactorial disorders (Thompson & Thompson).

The worldwide estimated number for the birth prevalence of infants with serious congenital disorders is 53 out of 1000 live births. From these, 3.9 out of 1000 are chromosomal disorders, 16.8 out of 1000 single-gene disorders and the rest, multifactorial disorders or of unknown origin (Vogel & Motulsky).

WHO region

Population millions

Chromosomal disorders/1.000

Single gene disorders/1.000

Total congenital disorders/1000

Annual affected live births

Eastern Mediterranean












SE Asia


















Western Pacific












Table 1: Overall global figures for the frequency of genetic disorders, by WHO (Vogel & Motulsky).

Thence, the characterization of mutations lead to genetic disease, the understanding how these mutations affect health and the usage of this information to improve diagnosis and management of diseases, are the principal aims of modern medical genetics (Thompson & Thompson).

Mendelian disorders

A disorder is characterized as Mendelian when the phenotypic trait of the disease is transmitted by a single locus following the laws of segregation and independent assortment (Vogel & Motulsky, Speicher, 2010). Considering the fact that humans are diploid organisms, all the alleles should follow the laws of Mendelian inheritance. Though, the majority of human genetic characters are not considered as Mendelian. They are governed by more than one gene locus. The genetic determination may involve many loci (polygenic), a small number of loci (oligogenic) or a single locus with a polygenic background. Generally, the more complex the pathway between a DNA sequence and a phenotypic trait, the less likely it is that the trait will show a Mendelian pedigree pattern (Strachan).

Basically, there are five patterns of Mendelian inheritance: autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive, Y-linked. Since males have only a single copy of X and Y chromosomes, they are hemizygous for each gene locus on the X and Y chromosomes. Therefore, they are never heterozygous for any X-linked or Y-Iinked character and in order to predict a male’s phenotype from his genotype it is not required to know whether this character is recessive or dominant (Strachan).

Analyzing a model of inheritance that follows the Mendel’s laws seems to be an easy assignment. However, in real life a basic Mendelian pattern can be concealed by various complications that are analyzed below (Strachan). For instance, some disorders it is possible to not be expressed at all in an individual, despite this person having the same genotype that causes the disorder in other members of his family. Moreover, the phenotypic expression of an abnormal genotype may be modified by different effects of aging, other genetic loci, or random environmental effects. (Thompson & Thompson). More specifically, phenomena such as penetrance (the proportion of individuals that show at least some degree of expression of the mutant genotype) (Cummings), expressivity (the severity of expression of the phenotype among individuals with the same disease causing genotype) (Thompson&Thompson), anticipation (the tendency of some conditions to become more severe or have earlier onset in successive generations), imprinting (autosomal dominant characters that are expressed only when inherited from a parent of one particular sex) (Strachan), genetic heterogeneity (different mutations at the same locus, allelic heterogeneity, or mutations at different loci, locus heterogeneity, produce the same phenotype), phenotypic heterogeneity (different mutations in the same gene, give rise to strikingly different phenotypes), can often complicate the diagnosis of diseases and the interpretation of family pedigrees (Thompson & Thompson).

Molecular diagnosis and carrier testing in a patient and his or her family requires the identification of the causative mutation for a Mendelian disease, a step of great importance for patient management and family counseling (Gilissen, 2011). Beyond this, identification of such monogenic disorders contributes greatly to our understanding of gene functions and biological pathways associated with health and disease in general (Oti, 2007). The fact that much of our understanding about the genetic basis of different complex diseases is based on our knowledge of Mendelian disorders, reflects the importance of studying these monogenic disorders (Vogel & Motulsky, Clark). Finally, genetic phenomena such as uniparental disomy, parental imprinting and epistatic interactions have been described through studies relied on the analysis of Mendelian traits (Antonarakis, 2006).

Originally, candidate gene resequencing and traditional gene mapping approaches (Table 2) such as karyotyping, linkage analysis, homozygosity mapping and copy number variants (CNV) analysis aided the elucidation of the genetic basis of many common and rare monogenic disorders (Gilissen, 2011). Over the previous decade, the introduction of new technologies that enable the sequencing of DNA at higher throughput rate and at much lower cost, comparing to older techniques, gave a boost to the research of monogenic disorders (Shendure, 2008) and established a rich framework for discovering the genes underlying previously unsolved Mendelian disorders (Bamshad, 2011).


Applies to



Candidate gene

Any disease

Easy to perform for one or two genes; requires no mapping, can directly identify the causative variant/mutation

Relies heavily on current biological knowledge; success rate very low

Genetic mapping by karyotyping

Any disease

Easy to perform; no familial cases required; can detect (large) balanced events

Low resolution, only detects large chromosomal aberrations; mutation detection requires second step

Genetic mapping by linkage analysis

Inherited disease

Easy to perform

Requires large families, often identifies large loci; mutation detection requires second step

Genetic mapping by homozygosity mapping

Recessive monogenic diseases

Small families can be used

Most useful for consanguineous families; often identifies large loci; mutation detection requires second step

Genetic mapping by CNV analysis

Monogenic/monolocus disease

High resolution CNV screening; no familial cases required; can potentially identify small loci

Only investigates CNVs; cannot detect balanced events, no base-pair resolution; mutation detection requires second step

Whole exome sequencing (WES)

Any disease

Base-pair resolution exome-wide; detects most types of genomic variation; can directly identify the causative variant/mutation

Unable to detect non-coding variants; limited resolution for CNVs and other structural variation; coverage variability due to enrichment process; relatively expensive

Whole genome sequencing (WGS)

Any disease

Base-pair resolution genome-wide; detects all types of genomic variation; can directly identify the causative variant/mutation

Data analysis complex; even more expensive than exome sequencing

Table 2: Mendelian disease gene identification approaches (Gilissen, 2011)

1.3 Human genetic variation and mutations

1.3.1 Human genome variation

The sequence of nuclear DNA is nearly 99.9% identical between any two humans. It is only a small fraction of DNA sequence that differs among individuals, which is responsible for the genetic variability among human population (Thompson & Thompson). This genetic variability is the molecular substrate of the evolutionary process (Vogel&Motulsky). Single nucleotide variants (SNVs), segmental duplications, low-copy repeats, indels (insertions, deletions), inversions and copy number variants (CNVs) can determine the genetic variation of the human genome (Vogel & Motulsky, Antonarakis, 2010). When a variant is so common that it is found in more than 1% of chromosomes in the general population, then it is characterized as a genetic polymorphism. In contrast, rare variants have frequencies less than 1% in the general population. Many deleterious mutations that lead to genetic disease are rare variants. However, there is not a simplified rule to describe the relationship between the effect of the allele on health and its frequency. Lots of rare variants appear to have no deleterious effect, whereas some variants common enough to be polymorphisms are known to predispose to serious illness (Thompson&Thompson). For example, the commonest mutation that causes cystic fibrosis in northern European populations (p.F508del) has a frequency of 1-2 % in northern European populations (Strachan).

A leading step in human population genetics over the past decade was the HapMap project, an international effort dedicated to the creation of a genome-wide haplotype map. Hundreds of thousands of SNPs and microsatellites have been identified by genotyping random populations all over the world. The result is a highly detailed puzzle of variation in our species. The data are available to the public at several Web sites, including that of the International HapMap Project (www.hapmap.org) and the Human Genome Diversity Project (hgdp.uchicago.edu) (Griffiths).

1.3.2 Mutations

Mutations are the ultimate source of all genetic variation (Griffiths). Any change in the nucleotide sequence or arrangement of DNA is defined as mutation. Traditionally, they are classified into three main categories (Table 3): genome mutations (affect the number of chromosomes in the cell), chromosome mutations (alter the structure of individual chromosomes) and gene mutations (alter individual genes).

Class of Mutation


Frequency (Approximate)


Genome mutation

Chromosome missegregation

2-4 × 102/cell division


Chromosome mutation

Chromosome rearrangement

6 × 104/cell division


Gene mutation

Base pair mutation,

small indels

10-4/base pair/cell division

Point mutations

Table 3: Types of Mutation and Their Estimated Frequencies (Thompson & Thompson)

Genome mutations are alterations in the number of intact chromosomes, called aneuploidy. They originate from errors in chromosome segregation during meiosis or mitosis. Chromosome mutations are changes that implicate only a part of a chromosome, such as partial duplications or triplications, deletions, inversions, and translocations. They occur spontaneously or may result from abnormal segregation of translocated chromosomes during meiosis. Gene mutations are described as changes in the DNA sequence of the mitochondrial or nuclear genomes, ranging from a change in a single nucleotide to changes that affect millions of base pairs. They can occur by errors during the process of DNA replication, or from a failure to repair DNA after damage. All three types of mutation occur at considerable frequencies in many different cells. In case that a mutation crops up in germline cell populations, it is possible be passed on to next generations. Contrariwise, somatic mutations occur accidentally only in a subset of cells in certain tissues, resulting in somatic mosaicism (i.e. cancer), hence they can not be transmitted to the next generation. Mutations can arise spontaneously or induced by chemical agents called mutagens, due to their ability to enhance the rate of mutations (Thompson & Thompson).

1.3.3 Gene mutations and pathogenic variants

As mentioned above, mutations, and especially gene mutations, are responsible for the genetic variation among human populations and represent the ultimate source of evolutionary changes (Olson-Manning, 2012). The vast majority of these mutations are completely harmless and have no effect on the genome fitness; but even those that do affect the phenotype are considered part of the natural variation that makes us all individual. Thence, medical concerns focus to those variants that are defined as pathogenic and associated with different disorders. Nonetheless, some of these variants may be pathogenic under specific conditions such as the environmental stress, and others may have modest effects that manifest as susceptibility to a disease only in combination with other genetic variants (Strachan).

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Request Removal

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please click on the link below to request removal:

More from UK Essays

We can help with your essay
Find out more
Build Time: 0.0026 Seconds