This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Most medical practitioners would object to the inclusion of this chapter in a molecular medicine book. What is the point of learning about the structure of DNA and RNA., would be their usual complaint; or they would say, "Why learn about basic biology when what we need is only the applied aspect?" An average medical student would probably ask, "What is the need to go back to the school level science?" I can assure all my readers that this chapter would certainly prove invaluable in providing a better understanding of this subject.
<H1>DISCOVERY OF DNA STRUCTURE
Crick and Watson deduced the structure of the DNA molecule based on the X-ray crystallography pictures that were the result of joint research by Maurice Wilkins and Rosalind Franklin. It is believed that Maurice Wilkins had a strained relationship with Rosalind Franklin. He, therefore, leaked Franklin's results to Watson and Crick who then proposed the double-helix structure for DNA (Fig. 2.1).
Crick and Watson celebrated their eureka moment in March 1953 by running from the now legendary Cavendish Laboratory in Cambridge to the nearby Eagle pub, where they announced over pints of bitter that they had discovered the secret of life.
They later presented their findings in April 1953. Their paper included the sentence "This structure has novel features which are of considerable biological interest." This sentence maybe one of the science's most famous understatements.
Nine years later, in 1962, they shared the Nobel Prize in Physiology or Medicine with Maurice Wilkins.
Fig 2.1 - Watson and Crick with a model of the DNA molecule
<H1>DEOXYRIBONUCLEIC ACID STRUCTURE
The structure of DNA is a double stranded anti parallel helix composed of deoxyribonucleotides linked together in a linear polymer fashion by phosphodiester bonds between neighboring sugar residues (Fig. 2.2). The two strands are held together by hydrogen bonds which bind opposed base pairs of purines and pyrimidines. This base pairing occurs in such a way that the adenine binds with thymidine (double hydrogen bond) and the guanine binds with cytosine (three hydrogen bonds). Therefore, the two nucleotide sequences are complementary to each other.
Each nucleotide consists of an organic nitrogen-containing base linked to a 5-carbon sugar that has a phosphate group attached to carbon at 5th position. In the case of DNA, the sugar is deoxyribose and in the case of RNA, the sugar is ribose.
Sugar-phosphate backbone is on the outside of the helix with the purine and pyrimidine bases extending as side groups.
Fig 2.2 - The diester bonds in the DNA molecule
Fig. 2.3: The double helical structure of DNA
In a similar way, the strands of ribonucleic acids also have an antiparallel organization. By convention, DNA and RNA sequences are always indicated in the 5' to the 3' direction.
The hydrogen bonds between the complementary base pairs not only maintain the structure of the DNA but also underlie important processes such as DNA replication and RNA transcription, processing and translation. Intermolecular base pairing is also responsible for the secondary structure that is integral to the function of mRNA, tRNA and rRNA. Because the hydrogen bonds can be easily disrupted and reannealed, the reversible processes are utilized in genetic tests.
The linear sequence of nucleotides linked by phosphodiester bonds constitutes the primary
structure of nucleic acids. Like polypeptides, polynucleotides can twist and fold into three-dimensional conformations stabilized by non-covalent bonds. Although the primary structures of DNA and RNA are quite similar, their three-dimensional conformation is quite different.
There are two purines and three pyrimidines. Adenine and guanine are the purines and thymine, cytosine and uracil are the pyrimidines. Adenine can pair with either thymine (in DNA) or uracil (in RNA). Guanine pairs with cytosine. The number of hydrogen bonds in this pairing has already been mentioned. These associations between a larger purine and a smaller pyrimidine are called Watson Crick base pairs.
Most DNA in cells is present in the form of a right-handed helix. The X-ray diffraction patterns indicate that the stacked bases are regularly spaced 0.34 nm apart along the helix axis. The helix makes a complete turn at every 3.4 nm, therefore, there are about 10.1 bases per turn. This is referred to as the B-form of DNA and is the normal DNA seen in cells. When most of the water has been removed from DNA, the B-DNA changes to the A-form which is wider and shorter than B-DNA. This is also a right-handed helix but there are 11 bases per turn.
Short DNA molecules in which there are alternating purine-pyrimidine nucleotides adopt an alternative left-handed helical configuration. This structure is called Z-DNA because the bases appear to form a zig-zag pattern when viewed from the side. The function is unknown, it is believed to increase the efficacy of gene transcription.
B DNA - Around 10 base pairs per turn. Right handed helix
A DNA - 11 base pairs per turn. Right handed helix
Z DNA - 12 base pairs per turn. Left handed helix
DNA is capable of bending because the DNA helix is flexible along its long axis. This is because there are no hydrogen bonds parallel to the axis of the DNA helix. This property allows the DNA to bend when complexed with a DNA binding protein. This bending is critical to the dense packing of DNA in chromatin, the protein DNA complex in which nuclear DNA occurs in eukaryotic cells.
DNA is composed of a double stranded anti parallel helix composed of deoxyribonucleotides linked together in a linear polymer by phosphodiester bonds between neighboring sugar residues.
The strands are held together by hydrogen bonds between laterally opposed base pairs of purines and pyrimidines in which adenine (purine) binds with thymidine (pyrimidine)(double hydrogen bond) and guanine (purine) binds with cytosine (pyrimidine)(three hydrogen bonds). Uracil is also a pyrimidine.
The two strands are complementary.
By convention, DNA and RNA sequences are always indicated in the 5' to the 3' direction.
Hydrogen bonds between the complementary bases can be easily disrupted and reannealed, the reversible processes are utilized in genetic tests.
Most DNA in cells is present in the form of a right handed helix.
DNA is capable of bending because the DNA helix is flexible along its long axis.
<H1>HISTONES AND NUCLEOSOMES
Before we proceed further and study the organization of DNA into chromosomes within the cell, it is necessary to look at molecules called histones. Histones are protein molecules which pack the DNA efficiently within the cell. Histones pack the large eukaryotic genome into the nucleus while still allowing the DNA to be accessed when required. They are also responsible for preventing the long DNA molecules from getting knotted or entangled with each other during cell division.
Five types of histones have been identified: H1, H2A, H2B, H3 and H4. Â A sixth form called H5 is the isoform of H1 in avians.
It is necessary to fold the DNA within the nucleus. The complex of DNA, histone and non histone proteins constitutes chromatin which exists in various degrees of folding or compaction. Chromatin consists of an equal mass of protein and DNA. Chromatin is dispersed through the nucleus in interphase cells. During metaphase, there is a further folding and compaction of chromatin which can be seen as visible metaphase chromosomes.
The histones and the DNA form the large picture. If you were to dissect the histone DNA complex, it would be seen that the fundamental packing unit is the nucleosome. Each nucleosome is about 11 nm in diameter and consists of a segment of DNA wound around the central histone protein core, which consists of eight histone proteins. The core is made up of two copies of H2A, H2B, H3 and H4 which obviously generates a histone octamer. The DNA wraps around this core to form a single nucleosome. Another histone (usually H1) fastens the DNA to the histone core (Fig 2.4). The total mass of this complex is about 100,000 daltons. The core particles are connected to each other like beads on a string. In the process of packaging, about 2 meters of DNA is packed into a nucleus with a diameter of about 10Â Âµm.
Nucleosomes form the repeating units of chromatin.The nucleosomes are in turn coiled into a 30 Âµm diameter chromatin fibre that constitutes the next level of chromatin organization.
Why should we bother about nucleosomes? It is because nucleosomes are not static structures that simply compact the huge amount of DNA within the cell. The attachment between
Fig. 2.4: Schematic diagram showing the chromosome with 8 histone proteins, spacer DNA segments and central protein scaffold
the histones and the DNA is critical for transcription. When a gene is "turned on", DNA is transcribed into RNA, but for this to occur, the histones must be removed and/or just simply pushed out of the way (either up or down the DNA molecule). By regulating how tight nucleosome binding is (or exactly where nucleosomes bind) the cell can control how active a gene is.
Histone acetylation is associated with turning genes on (by loosening the DNA/nucleosome interactions) and methylation is associated with turning genes off (by tightening DNA/nucleosome interaction). This will be alluded to later in the book.
It is the precise combination of modified amino acids in histone tails that helps in controlling the condensation or compaction of chromatin and its ability to be transcribed, replicated or repaired. Highly condensed chromatin is called heterochromatin and less condensed chromatin is called euchromatin. Heterochromatin remains in a compact state during interphase and is usually found in the centromeres and telomeres of chromosomes. Another classical example of heterochromatin is the Barr body which is an inactivated X chromosome in females. Heterochromatin, because of its condensed structure, remains transcriptionally inactive.
Histones pack the large eukaryotic genome into the nucleus efficiently while still allowing the DNA to be accessed when required.
Five types of histones have been identified: H1 H2A, H2B, H3 and H4. Â
The complex of DNA, histone and non histone proteins constitute chromatin
The fundamental packing unit is the nucleosome.
Each nucleosome consists of a segment of DNA wound around the histone protein core.
The attachment between the histones and the DNA is critical for transcription.
Histone acetylation is associated with turning genes on (by loosening the DNA/nucleosome interactions) and methylation is associated with turning genes off (by tightening DNA/nucleosome interaction).
<H1>ORGANISATION OF GENES AND NON CODING DNA
Coding DNA is that DNA which is transcribed to RNA to be later translated to proteins. Non-coding DNA, on the other hand, does not translate into proteins. Detailed examination of the structure of DNA in humans has shown that the genome contains a large amount of non-coding DNA. Also, the density of the genes varies in humans; there are areas of 'gene-rich' regions and interspersed areas of 'gene-poor' regions, where most of the DNA is of the non-coding type. Ultimately, about 98.5% of human DNA is non-coding.
Why should the cell carry this extra baggage? It appears that different selective pressures during the process of evolution may account for this non coding DNA. For example, a section of DNA may have been useful to a primitive organism which was low down on the evolutionary scale many million years ago. As the organism evolved, that particular portion of DNA became redundant to the organism since the organism began manufacturing entirely different sets of proteins. The DNA however, persisted in the organism, since metabolically, it didn't matter much to the organism to carry a little extra DNA. This cycle repeated itself over millions of years and ultimately, we have the entire extra non coding DNA in our genomes today.
When classified according to the basis of nucleotide repeats in a sequence of nucleotides, DNA can be categorized into: single copy sequences, moderately repetitive sequences and highly repetitive sequences.
Many of the regions of non coding repetitive sequences have important functions as in centromeres, telomeres, origins of replication and regulatory elements. However, the largest percentage constitutes junk DNA. Functions of repetitive DNA have been hypothesized, as for example, they are believed to mediate attachment of chromatin loops to the nuclear matrix and this regulates transcription activation.
The two types of repetitive DNA sequences are: Tandemly repeated DNA and interspersed repetitive DNA.
<H2>TANDEMLY REPEATED DNA
This can be classified into 4 categories:
Mega satellite - In this type of DNA, the nucleotide sequences are repeated 50 to 400 times producing blocks that are several kilobases long. Some megasatellites are composed of coding repeats like rRNA genes and the deubiquitinating enzyme gene
Satellite - It consists of very large arrays of tandemly repeated DNA sequences in which the repeat element ranges from 5 to over 170 base pairs (bp) in length. Individual blocks of satellite DNA can be 100 kb to several megabases in length. Satellite DNA is not transcribed, is seen around the region of the centromere and forms about 15 % of the total DNA (heterochromatic regions of DNA). These regions have a high frequency of the nucleotides, adenine and thymine. They have a lower density and form a second 'satellite' band when the genomic DNA is separated along a density gradient (Fig 2.5). They probably have a functional role as protein binding sites.
Satellite sequences on chromosome 3 labeled with fluorescent dye (green). It is clear that the dye is seen near the centromeres.
Fig 2.5 - A schematic diagram of the density gradient and a picture of satellite sequences in a chromosome
Fig 2.6 - A schematic diagram of a VNTR
Minisatellite - It forms a class of simple sequence repeats and comprises of tandemly repeated DNA sequences. The size of the repeated unit is approximately 14 to 500 bp. This repeat comprises blocks that are usually 0.1 to 20 kb long. Minisatellites are sometimes referred to as VNTR's (Variable Number of Tandem Repeats)(Fig 2.6) and are dispersed through most regions of the genome. VNTR are characteristically 14 to 100 nucleotides long. They are clusters of tandem repeats with 4 and 40 times per occurrence.
Microsatellite - It is a class of sequence that is composed of tandemly repeated sequences in which the repeat unit is 1 to 13 bp long. Microsatellite DNA forms blocks that are often less than 150 bp long, sometimes referred to as Short Tandem Repeats or STRs. Microsatellites are interspersed throughout the genome and are generally intragenic although they can occur within intronic or non coding sequences as well. They are commonly CA repeats. Microsatellites are thought to arise because of template slipping during DNA replication. They are extremely polymorphic and superb for DNA fingerprinting (pioneered by Alec Jeffreys). It is best to examine several STR loci and determine the unique genetic profile of an individual. They have a great utility in identity determination. Microsatellites often occur within transcription units. Some individuals are born with a larger number of repeats in specific genes than the general population. As mentioned above, this occurs because of template slipping during DNA replication. Several neuromuscular diseases contain an increased number of repeats depending on the gene in which they occur. In some diseases, microsatellites behave like a recessive mutation because they interfere with the expression of the encoded genes. Some of the microsatellites behave like dominant mutations.
Within a species, the nucleotide sequences of the repeat units composing simple sequence DNA tandem arrays are highly conserved among individuals. In contrast, the number of repeats varies from individual to individual. This is believed to occur because of unequal cross over during meiosis. As a consequence of these unequal cross over, the length of these tandem arrays is unique to each individual.
Most of these repeats occur as minisatellites. Even slight differences between individuals can be detected by using the PCR technique using a mix of several primers that hybridize to unique sequences flanking multiple minisatellites. These polymorphic loci form the basis of DNA fingerprinting. An example of how minisatellites help in DNA fingerprinting is provided in the case scenario 2.1.
<SCREEN>Case scenario 2.1
Poonam Sharma was a TV serial actress who was engaged to Ritwik Pandey, an engineer working with DRDO. One day, Ritwik found Poonam talking to her director, Deepak Kanwar. Ritwik suspected that all was not well and he began plotting Deepak's murder. One night, he entered Deepak's flat and killed him using a khukri. The khukri is a flat bladed instrument which is like a short sword. Ritwik killed Deepak by slashing him repeatedly and finally, partially decapitating him. Unfortunately, Ritwik also cut his hand and a few drops of his blood were inadvertently mixed with Deepak's. When the police finally entered the house, they found the partially decapitated body of Deepak lying in a pool of blood. They suspected that Ritwik had probably murdered Deepak but he had a solid alibi; he claimed that he had been working in his lab late at night and he was nowhere near the scene of the crime. He had punched himself in and out of the lab at 11.00 PM and 0300 hrs the next morning, a fact that the police immediately verified. The police collected some of the blood at the scene of the crime. They also collected samples of Poonam's blood and Ritwik's blood. DNA fingerprinting was done. Based on the DNA fingerprinting report, Ritwik was taken into custody. The police then found that Ritwik had found a method of corrupting the lab computers so that the time frame would be distorted. They later found that he had actually punched himself in at 6.00 PM and punched himself out at 10.00 PM. He had then gone to Deepak's house and murdered him. The police confronted Ritwik with the evidence. Ritwik then broke down and confessed to the entire crime.
What exactly had the forensic lab done to establish Ritwik's guilt? They had collected the blood from the scene of the crime in addition to Deepak's blood and Ritwik's blood. They extracted the DNA from all the three samples and ran a series of microsatellites using the Polymerase Chain Reaction (the details of the PCR will be elaborated on later). For simplicity, only two STR's are shown in the figure. When you look at Ritwik's DNA, there are two STR's which appear fairly low down on the gel indicating that the STR's are of low molecular weight. Deepak's DNA also shows two STR's which are of relatively higher molecular weight. When we look at the DNA from the scene of the crime, we see that there are bands which correspond to both Ritwik's and Deepak's DNA. The presence of Deepak's DNA at the scene of the crime is not surprising since he was the victim. However. Ritwik's DNA had no reason to be present at the scene of the crime since he had claimed that he was nowhere near the scene of the crime. Therefore the possibilities are two; one is that there is a one in a few billion chance that there is an individual in this world who has an identical STR pattern. The second possibility is of course, that Ritwik is lying.
If you were the investigating policeman, which choice would you accept?</Screen>
<H2>INTERSPERSED REPETITIVE DNA
These stretches of DNA account for 45% of the human genome. These are composed of a large number of copies of relatively few sequence families. These are also known as moderately repeated DNA or intermediate repeat DNA.
This DNA has the unique capacity to move in the genome and is therefore said to carry transposable DNA elements. When transposition occurs in germ cells, the transposed sequences at the new site are passed on to successive generations. Thus, these transposable genetic elements have accumulated in the genome.
The question is, how does this DNA move? This DNA can transpose directly as DNA or can transpose via an RNA intermediate. DNA transposons transpose directly as DNA. DNA transposons use an enzyme called transponase which is required for the transposition of the insertional sequence (IS) to another site. They excise themselves from one place in the genome and move to another. When an RNA intermediate is used, they are called Retrotransposons.
Most transpositions in eukaryotes occur through retrotransposons. These use reverse transcriptase. They are divided into two major categories, those that contain a Long Terminal Repeat (LTR) and those that do not contain one. The LTR retrotransposons are less abundant in mammals. They are characterized by the presence of LTR's flanking the central protein coding region. The LTR's are composed of 250 - 600 base pairs. The most common LTR transposons in humans are called ERV's or Endogenous Retroviruses. Since they comprise only about 8% of the genomic DNA and have little significance, we shall not elaborate about them any more.
Non LTR transposons are of two types in the body, LINES (long Interspersed Repeats) and SINES (Short Interspersed Repeats). LINES are about 6 kb long and SINES are about 300 bp long.
Of the LINES and SINES, the more abundant member in this group is the LINE family which accounts for about 21% of the human genome. The SINES account for about 11% of the human genome.
LINES and SINES are capable of integrating anywhere in the genome. This may cause problems when they insert into a protein coding region. About 1 in 600 mutations that cause significant disease in humans are caused by LINE or SINE transpositions. LINES and SINES have also contributed significantly to the evolution of higher organisms.
The human mitochondrial gene is only 16.6 kb long but its organization is extremely compact. There are 13 protein encoding genes all of which encode components of oxidative phosphorylation. Mitochondrial genes lack introns and are closely opposed and their sequences often overlap. However, there are hundreds to thousands of copies of mitochondria and each of these copies has 2 - 10 copies of the genome, the mitochondrial DNA comprises about 0.5% of the DNA in the somatic cell.
There are several diseases which are associated with defects in mitochondrial DNA like Leber's Hereditary Optic Neuropathy, Growth Retardation, Aminoaciduria, Lactic acidosis and early death (GRACILE) and MERFF (Mitochondrial Myopathy with Ragged Red Fibers). Analysis of these mutations is beyond the preview of this book.
The severity of a disease caused by mutations in mtDNA depends on the type of mutation and the proportion of mutant and wild type mtDNA present in a particular cell type. Generally, the cells contain a mixture of normal and mutant mtDNA, a condition known as heteroplasmy. Each time a germ cell divides, the normal and mutant DNA segregate randomly into daughter cells. Thus, the mtDNA genotype fluctuates from one generation to the next. Mutant DNA can also accumulate as a result of aging.
Coding DNA is that DNA which is transcribed to RNA which later translated to proteins.
About 98.5% of human DNA is non coding. It appears to be because of the different selective pressures during evolution.
Repetitive DNA can be classified as single copy sequences, moderately repetitive sequences and highly repetitive sequences. They can also be classified as Tandemly repeated DNA and interspersed repetitive DNA.
Tandemly repeated DNA are mega satellites, satellite DNA, mini satellite and micro satellites.
Mini satellites are called VNTR's and microsatellites are called STRs.
STRs are very useful in DNA fingerprinting.
Interspersed repetitive elements account for 45% of the human genome. They have the unique capacity to move in the genome and are therefore called transposable DNA elements.
They can transpose directly as DNA or can transpose via an RNA intermediate.
The human mitochondrial gene is only 16.6 kb long with a compact organisation.
There are 13 protein encoding genes all of which encode components of oxidative phosphorylation.
Mitochondrial genes lack introns.
Leber's Hereditary Optic Neuropathy, GRACILE (Growth Retardation, Aminoaciduria, Lactic acidosis and early death) and MERFF (Mitochondrial Myopathy with Ragged Red Fibers) are some of the diseases associated with mitochondrial genetic abnormalities.
<H2>SINGLE NUCLEOTIDE POLYMORPHISMS
A single nucleotide polymorphism or SNP (pronounced snip) is a DNA sequence variation that occurs when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual). For example, DNA fragments AAGCCTA and AAGCTTA from two different individuals are different in a single nucleotide. In this case, we say that there are two alleles: C and T. These SNPs constitute the most common genetic difference that can occur between two individuals. Almost all common SNPs have only two alleles. About two of every three SNPs, involve the replacement of cytosine (C) with thymine (T). By convention, if the frequency of the minor allele (called minor allele frequency or MAF*) is 1% or more, it is called a polymorphism or SNP. If the frequency is < 1%, it is known as mutation.
<FN>* MAF refers to the frequency at which the less common allele occurs in a given population. </FN>
Single nucleotide polymorphisms may fall within coding sequences of genes, noncoding regions of genes, or in the intergenic regions between genes. SNPs within a coding sequence will not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code. A SNP in which both forms lead to the same polypeptide sequence is termed synonymous (sometimes called a silent mutation) - if a different polypeptide sequence is produced they are non-synonymous. SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA.
Variations in the DNA sequences of humans can affect how humans develop diseases, respond to pathogens, chemicals, drugs, etc. However, their greatest importance in biomedical research is for comparing regions of the genome between cohorts (such as with matched cohorts with and without a disease).
<H3>SINGLE NUCLEOTIDE POLYMORPHISMS IN COMPLEX GENETIC DISORDERS
Most researchers believe that the complex disorders are either oligogenic, that is, the cumulative result of variants in several genes, or polygenic, resulting from a large number of genetic variants, each contributing small effect. Still others propose that these disorders result from a complex interaction between one or more genetic variants and the environmental risk factors.
<H3>SINGLE NUCLEOTIDE POLYMORPHISMS IN INFECTIOUS DISEASES
Several genetic disorders are associated with protection from disease. A classical example is sickle cell anemia, which coexists with malaria in several stretches of Africa. It is now known that the presence of a sickle cell trait confers survival advantage in malaria. A less extreme form of sickle cell mutation is presence of polymorphic loci which also govern the way individuals respond to infectious agents. Personal experiences of most of us would testify to the role that SNP's play in infectious diseases. For instance, even when several people in a family are infected with the same microorganism, each person responds differently. Some examples of gene polymorphisms and their role in disease include the role of HLA associations with HIV progression. Some specific class I HLA types, such as B27 and B57, have been associated with a better prognosis and others, including allelic variants of B35, with poor prognoses. Two independent TNF-promoter polymorphisms have been associated with the clinical profile of tuberculosis. IL10 polymorphisms seem to have a role in hepatitis, as well as in HIV progression. The list of gene polymorphisms influencing infectious disease is endless and forms a fascinating facet to the study of infections.
<H3>LIMITATIONS OF SNP ANALYSIS
Genetic association studies have become one of the most common forms of experimental design in the medical literature and remain perhaps some of the hardest to interpret. Association is sought between a specific SNP and the clinical outcome by direct comparison of an individual genotype and the clinical features of the disease. The problems come up when one questions 'what are the criteria for the diagnosis of the disease'? The next problem that arises in a case control study is 'what guarantee is there that the control will not develop the same disease later'? For example, if one wants to study gallstones, the ideal way of taking up a control would be to do an ultrasound and confirm the absence of gallstones. However, there is no guarantee that the same individual will not develop gallstones at a later date and that nullifies the entire study because the control then becomes the case.
As mentioned, choosing controls is of immense importance. Controls have to be drawn from the same ethnic group and they also have to be age- and sex-matched.
SNP analysis is heavily driven by statistics and the statistical errors can make all the difference between a positive and a negative result. Special attention has to be paid to issues such as lack of power and small sample size, disease classification or status, problems derived from chance, bias, and confounding factors.
Linkage disequilibrium is the best known confounding factor affecting case-control studies. It can be defined as non random association of alleles at different loci. If linkage disequilibrium is present, the possibility exists that the original marker tested is not the causal allele, and further studies of the region are warranted. SNPs are typically analyzed in isolation, whereas, it may be the precise combination of SNPs on a given chromosome (the haplotype) that determines its significance.
Finally, publication bias should be avoided so that both positive and negative results are accessible to the public, as long as they fulfil minimal methodological criteria.
A Single Nucleotide Polymorphism or SNP (pronounced snip) is a DNA sequence variation occurring when a single nucleotide in the genome differs between members of a species or between paired chromosomes in an individual.
About two of every three SNPs, involve the replacement of cytosine (C) with thymine (T).
Single nucleotide polymorphisms have a minor allele frequency of at least 1%.
Variations in the DNA sequences of humans can affect how humans develop diseases, respond to pathogens, chemicals, drugs, etc.
It is used extensively in research to compare the genome between cohorts (as in people with or without a disease).
SNP analysis can be very difficult to do because of the difficulty in choosing controls. It is also very difficult to do statistical analysis in SNP analysis.
Methylation of cytosine nucleotides plays an important part in regulating gene activity. DNA methylation is a complex process whereby one of the three DNA methyltransferases (DNMTs) catalyzes the addition of a methyl group from the universal methyl donor S-adenosyl-L-methionine, to the 5-carbon position of cytosine (see figure 2.7). This modification, occurring predominantly within the CpG dinucleotide, is the most prevalent epigenetic modification of DNA in mammalian genomes. Essentially, methylation inactivates transcription. CpG methylation profoundly influences many processes including transcriptional regulation, genomic stability, chromatin structure modulation, X chromosome inactivation, and the silencing of parasitic DNA elements. These diverse processes, nevertheless, appear to share a common characteristic, that is, they all exert a stabilizing effect which promotes genomic integrity and ensures proper temporal and spatial gene expression during human development.
Figure 2.7: Addition of a methyl group from donor S-adenosyl-L-methionine, to the 5-carbon position of cytosine
Only about 3% of cytosine in human DNA is methylated. The methylated CpG sequence is chemically unstable and prone to deamination followed by ineffective DNA repair. Thus, there is a loss of the CpG islands and a concomitant under representation of the CpG sequences in human DNA. Usually, methylation sites are located near highly expressed genes. They are probably responsible for structural changes in chromatin which are necessary for transcription to proceed.
DNA methylation patterns change dramatically during embryonic development. Genome wide demethylation after fertilization is followed by waves of de novo methylation upon embryo implantation. Not all sequences in the genome, however, are demethylated upon fertilization and not all sequences become de novo methylated after implantation. This ensures that development follows a precise pattern. These exceptions further emphasize the regional specificity of distribution of genomic DNA methylation patterns.
Genomic DNA methylation patterns are not randomly distributed. Rather, discrete regions, including most repetitive and parasitic DNA, are hypermethylated, while other regions, such as CpG rich regions often associated with the regulatory regions of genes (CpG islands), are hypomethylated (Fig 2.8). This, therefore, ensures that repetitive and parasitic DNA is not transcribed, whereas, the regulatory regions are transcribed as and when required. The roles get reversed in the development of cancer.
In neoplastic cells, the pattern of DNA methylation is significantly altered with a general decrease in DNA methylation (Fig 2.9). This leads to the unregulated expression of several genes. However, some tumours may show increased methylation patterns because the genes they silence are those responsible for DNA repair.
Fig. 2.8: In a normal cell, CpG islands are protected from being methylated. CpG islands away from the transcription start sites and in repetitive elements are methylated. Therefore, genes can get transcribed but exons away from the start site and repetitive elements do not get transcribed. Therefore, there is a control over the transcription process. Green stars show unmethylated, red stars methylated CpG sites.
Fig 2.9 - In a cancer cell, there is focal hypermethylation and global hypomethylation. CpG islands flanking start sites of some genes may become methylated. Therefore, these genes are not transcribed. Intragenic CpG sites and repeats are becoming unmethylated. This allows unrestricted transcription of exons which are not supposed to be transcribed. Green stars show unmethylated, red stars methylated CpG sites.
Evidence of the great importance of these methylation patterns can be understood by examining the effects of disrupting them in vivo. Disruption of normal DNA methylation patterns is one of the most common features of transformed cells and a number of studies have revealed that methylation changes are early events in the tumorigenesis process and contribute directly to transformation.
Genetic linkage is the tendency of certain loci or alleles that are inherited together. Genetic loci that are physically close to one another on the same chromosome tend to stay together during meiosis, and are thus genetically linked.
At the beginning of normal meiosis, a homologous chromosome pair (called a bivalent, made up of a chromosome from the mother and a chromosome from the father) interwine and exchange sections or fragments of chromosome. The pair then breaks apart to form two chromosomes. However, because of this intertwining and exchange of genetic material, the two chromosomes have a genetic profile which is distinctly different from that of the maternal and paternal chromosomes. Through this process of recombining genes, organisms can produce offspring with new combinations of maternal and paternal traits that may contribute to or enhance survival (Fig 2.10).
This recombination of genes, called crossing over of DNA, can cause alleles previously on the same chromosome to be separated and end up in different daughter cells. The farther the two alleles are apart, the greater the chance that a cross-over event may occur between them, and the greater the chance that the alleles are separated.
Figure 2.10 - illustrating unequal cross over during meiosis. Note that the genes which are close together are usually inherited together because they are not separated during meiosis. Genes that are farther apart may not be inherited together.
The relative distance between two genes can be calculated by taking the offspring of an organism showing two linked genetic traits, and finding the percentage of the offspring where the two traits do not run together. The higher the percentage of descendants that does not show both traits, the farther apart on the chromosome the two genes are. However, if a large percentage of the descendants show a coinheritance of the traits, it means that the genes are close together on the chromosome. Genes for which this percentage is lower than 50% are typically thought to be linked.
Genetic linkage can also be understood by looking at the relationships among different phenotypes. Some phenotypes or traits appear randomly, while others occur in some relation with respect to one another. Random appearance is known as independent assortment, while, the latter method is known as genetic linkage.
Genetic linkage develops when genes appear near one another on the same chromosome. This phenomenon causes the genes to be usually inherited as a single unit. Genes inherited in this way are said to be linked, and are referred to as "linkage groups".
Linkage analysis refers to the segregation of a disease in large families with polymorphic markers for each chromosome. The mathematical analysis is complex and involves the use of likelihood ratios, the logarithms of which are known as LOD scores (Logarithm of the Odds)
Linkage Disequilibrium is defined as the association of two alleles at linked loci more frequently than would be expected by chance. It is also referred to as allelic association. The concept will be elaborated upon in the discussion of Hemophilia.