Genetic Basis Of Variation In Complex Traits Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Quantitative trait locus  analysis is a statistical method that links phenotypic data and genotypic data in an attempt to explain the genetic basis of variation in complex traits (Lynch & Walsh, 1998). I have designed a study to identify QTL conferring resistance/susceptibility to Newcastle disease (ND) (defined as survival time following disease challenge) in commercial layers chickens.

Choice of my study

Newcastle is one of the diseases responsible for major loss in poultry industry. As ND virus have evolved to a point that commercial vaccines are no longer protective enough, genetic improvement of chicken innate resistance to the disease is the alternative choice. This is possible through marker-assisted selection on genomic regions conferring ND resistance. We need an extensive QTL study to identify such genomic regions.

The basic requirements of my QTL study are

Informative mapping population that have large expected genetic difference in susceptibility to Newcastle disease virus (e.g. divergent selection lines).

Efficient population design that generates high Linkage Disequilibrium (LD) to improve power in QTL mapping.

Polymorphic marker that is abundant and uniformly distributed throughout the genome (e.g. SNP) that gives information on IBD sharing between relatives. SNPs are selected as marker of choice for this study because using high-density SNP panels available these days will significantly increase power of QTL mapping.

Collection of as many different phenotype as possible on individuals in the study

Novel statistical genetic model that accurately estimates QTL effects on the trait

Study design and Methodology

a. Mapping population

Two parental lines from commercial layers divergently selected for either high (HS) or low (LS) susceptibility to experimental challenge with ND virus will be choosen. In F0 generation, 4 HS males will be mated with 16 LS females (HL) and 5 LS males to 10 HS females (LH). This reciprocal cross is done to nullify the effect of mating type in each line. From F1 generation, 4 HL males will be mated with 40 HL females and 3 LH males will be mated to 30 LH females. Approximately 630 F2 chickens will be selected for QTL study. These controlled mapping cross is designed to segregate variation that is fixed in two divergent parental lines to produce all three possible single-locus genetypes.

b. Phenotype collection

F2 chicks will be inoculated with virulent strain of ND virus. Age at death after showing typical symptoms of disease from 30 days of age will be recorded. Deaths only after 30 days are recorded to rule out non specific early chick mortality.

c. Selecting markers and genotyping

Around 60,000 genome wide SNPs will be screened in parent lines to identify informative markers for genotyping F2 population. Highly informative markers selected from 60K SNPs are used for construction of linkage map using CRI-MAP software package. From F2 population, 30 % of chicks with highest and shortest survival time past 30 days are genotyped. This selective genotyping is done because individual with extreme phenotype contain majority of information to identify markers linked to that trait (McElroy et al, 2005).

d. Genetic model for QTL

We will perform standard genome wide one dimensional QTL analysis to fit a one locus model to detect QTL with main effects on the Newcastle disease resistance/susceptibility.

d. Statistical analysis and QTL mapping

QTL analysis is performed by regression interval mapping (Haley & Knott 1992). The additive and dominance coefficient of a putative QTL is calculated. Least squares regression model is fitted at 1-Mb intervals along each chromosome and the F-value for the QTL effect is calculated at each point. Significant threshold is derived using genome wide permutation testing.

e. Interpretation of the results

Statistical analysis will evaluate the probability that interval between two markers is associated with a QTL affecting disease resistance/susceptibility. Best estimate of QTL positions is given by the chromosomal position corresponding to highest significant likelihood ratio. Further fine mapping of these QTL regions will identify candidate gene/s and causal mutations responsible for resistance/susceptibility to Newcastle disease (ND).

9. a. Logarithm of the odds (LOD) score, Z, is the statistical test that compares the likelihood of obtaining the test data if the two loci are indeed linked (with recombination fraction q) to the likelihood of obtaining same data purely by chance. LOD scores are a function of the recombination fraction and therefore are calculated for a range of q values. The q for which the maximum value of Z is calculated, gives the most likely recombination fraction between the two loci tested. It is used to analyze pedigrees to determine linkage between Mendelian traits, a trait and a marker or between two markers.

LOD score is calculated as

(θ=recombination fraction, NR=number of non-recombinant offspring, R=number of recombinant offspring)

The denominator shows the likelihood that two loci are completely unlinked with 50 % chance of recombination, due to independent assortment. The numerator shows the likelihood of the linkage between two loci. It is calculated at various values of θ within the range of allowable values (0.00-0.49).

Decision rules for evaluating LOD score

Z > 3.0 Significant evidence for linkage at the given recombination fraction

Z < -2.0 Significant evidence for non-linkage

-2.0 < Z < 3.0 Linkage data inconclusive

9. b.

For locus A vs B, maximum value of LOD score (Z) occurs at recombination fraction of 0.1 and is greater than +3. Therefore, as explained in answer 9.a., there is a statistical evidence for genetic linkage between locus A and B. As the recombination fraction for which the maximum value of Z is calculated gives the most likely recombination fraction between the two loci tested, estimated recombination fraction between locus A and B is 0.1.

For locus A and C, Z is always negative whatever the value of recombination fraction (θ) used, and the best estimate of θ is 0.5. This shows significant evidence of non linkage between locus A and C.

6. Two loci, A and B segregate two alleles A1/A2 and B1/B2. Let allele frequency at A be p1 and p2 and at B be q1 and q2. If the two loci are in Linkage disequilibrium,





P11=p1q1 + D

P12=p1q2 - D



P21=p2q1 - D

P22=p2q2 + D






D= Linkage disequilibrium coefficient

Given dataset

















p11=34/100=0.34, p12= 13/100=0.13, p21=15/100=0.15, p22=38/100=0.38


p1=47/100=0.47, p2= 53/100=0.53, q1=49/100=0.49, q2=51/100=0.51

So, D = p11-p1q1 = 0.34-0.47*0.49 = 0.1097

Chi-square test to check if the deviation from linkage equilibrium is significant

Null hypothesis, H0 = Loci A and B are not linked (linkage equilibrium)

Alternate hypothesis, H1= Loci A and B are linked (deviation from linkage equilibrium)



















*If A and B are unlinked

x2 (linkage, 1 df)= S (obs exp)2 /exp = (34-23.03)2/23.03 + (13-23.97)2/23.9 + (15-25.97)2/25.97 + (38-27.03)2/27.03 = 19.3317

We know, if x2 (1 df) > 10.8, p < 0.001; so reject H0 and H1 is accepted. Therefore, the deviation from linkage disequilibrium is significant and loci A and B are in linkage disequilibrium.

5. Gene duplication is one of the key factor driving genetic innovation, i.e. producing novel genetic variants. The evolutionary forces acting on duplicated genes are diverse. A number of interdependent variables determine the fate of duplicated gene which includes its functional category, degree of conservation, sensitivity to dosage effects, as well as its regulatory and architectural complexity (Conrad & Antonarakis, 2007). The following are the fates of gene duplication identified till date,

1. Gene loss: One copy may become silenced or mutated and losses its function by degenerative mutation (Nonfunctionlization). DNA methylation is most likely involved in silencing of duplicated genes as well as histone deacetylation and methylation.

Selective advantage: Duplicate genes will exhibit increase in mRNA leading to overexpression of the gene which in most cases is deleterious. So, usual fate of duplicated copy of the gene pair is nonfunctionalization by a strong purifying selection.

2. Functional divergence: Three possible outcomes are possible,

a. Neofunctionalization: One copy may acquire novel, beneficial mutation as a result of alterations in coding or regulatory sequences and become preserved by natural selection, with the other copy retaining the original function

Selective advantage: This mechanism leads to retention of both copies and is important mechanism of gene retention in larger population. One copy acquires a novel, evolutionarily advantageous (adaptive) function due to occurrence of rare beneficial mutations.

b. Subfunctionalization: Both copies may retain different subsets by mutation accumulation to the point at which their total capacity is reduced to the level of the single-copy ancestral gene (The duplication-degeneration-complementation model).

Selective advantage: It is an alternative mechanism driving gene retention in organism with small effective population size. Even if duplication event do not have conferred a selective advantage, it helps in retention of duplicated genes. It also helps to preserve duplicate copies for eventual neofunctionalization, a role as a transition state.

c. Duplication by retrotransposition: A recently recognized primate-specific subgroup of duplications generated by retrotransposition to acquire somatic and male germline function.

Selective advantage: To enhance male germline function

3. No functional divergence: Both redundant gene copies are retained in the genome without significant functional divergence.

Selective advantage: The organism may acquire increased genetic robustness against harmful mutations.

4. Duplication in multigene families: In multigene families descended from a common ancestor, individual genes in the group exert similar functions. Two models of evolution have been explained,

a. Concerted evolution: All genes in a given group evolve coordinately, and that homogenization is the result of gene conversion.

b. Birth and Death evolution: In this model of evolution, duplicate genes are produced and some of the duplicate genes diverge functionally but others become pseudogenes owing to deleterious mutations or are deleted from the genome. The end result of this mode of evolution is a multigene family with a mixture of divergent groups of genes and highly homologous genes within groups plus a substantial number of pseudogenes.

4. The transcriptome is a complete set of transcripts and their relative levels of expression in a particular cell or tissue type under defined condition (Gibson & Muse, 2009). There are several technologies developed for transcriptome analysis; Microarrays and RNA-sequencing are two best known.

a. Microarrays: It is a way to study which genes are expressed in a specific tissue, situation or individual. The basic principle of Microarray analysis is to deposit small amount of DNA corresponding to each one of thousands of known genes (probes) of the organism onto each spot of array surface. All mRNA molecules extracted from the tissue under study are labeled with a fluorescent dye and hybridized to Microarray plate. The abundance of a particular transcript is detected as the intensity of fluorescent signal which is quantified by computer analysis.


Microarray compares gene expression profiles between two mRNA samples. It allows massive high-throughput parallel determination and multiple measurements of gene expression profiles of samples from different experiments to be performed simultaneously. Microarrays are powerful tools for detection of candidate genes by detecting if a gene is transcribed under one set of conditions but not another. Other advantages are in definition of genetic pathways, dissection of regulatory mechanisms and quantification of transcriptional variance e.t.c. It requires a small amount of material and a modest investment of cost and labor; save much time and is readily automated.


Transcript profiling using microarrays is limited to the genes that are represented on the chip. Genes that are unknown yet, wrongly annotated genes, genes that produce no transcript e.t.c will not be represented in microarray. Microarrays are still expensive for large genomes (e.g. mammals). As it produces extremely large datasets, it requires effective database resources. Also it is difficult to distinguish among different transcripts from genes belonging to the same gene family due to cross-hybridization. It uses cDNA libraries to produce probes which miss rarely expressed genes, so chance of overlooking at regulatory genes is possible. Uncertain quality control may be a limitation as there are many artifacts associated with image analysis and data analysis.

b. RNA-sequencing: This is the method of direct sequencing of fragments of cDNA for characterization of transcriptomes. The basic principle of RNA-sequencing is to isolate poly A fraction of cellular RNA and fragment them into 200 base sequences. These sequences are used to prime random cDNA synthesis to obtain Short Quantitative Random RNA Libraries (SQRLs). These libraries are then used as template for next generation sequencing. All the short sequence reads generated are mapped back to the reference genome and aligned with exon sequences to produce a genome scale transcription map that consists of both the transcriptional structure and/or level of expression for each gene.

Advantages: Unlike microarray approach, it is not limited to detecting transcripts that correspond to existing genomic sequence, which makes it attractive for organisms whose sequence are yet to be determined. It is cheaper approach for study of large genomes. It is useful for studying complex transcriptomes as it precisely reveals connection between multiple exons and alternative splicing. Small RNA that are too short for stable hybridization can also be studied using RNA-seq. It also detects sequence variation in transcribed region. As compared to microarray, it has low background signal and has large range of expression levels over which transcripts can be detected.

Disadvantages: At the moment, it is more expensive and time consuming than standard expression arrays. It also faces some data analysis challenges; lack of strand orientation information, mapping reads to splice junctions and poly A ends, chances of dilution of transcript population because of high-abundance RNA (e.g. Ribosomal RNA), need of efficient methods to store, retrieve and process large amounts of data e.t.c

Fig 2: Principle of RNA-sequencing (Source: Wang et al, 2009)

Example Project: Gene expression studies in Canine Hepatic disease using microarray analysis

Aim of the study

The aim of the study is to identify potential candidate genes involved in Canine Hepatic diseases by measuring mRNA expression in diseased and healthy canine liver using gene expression microarray.


Liver tissue sample will be collected from dogs clinically diagnosed with Hepatic disease. Control samples are taken from healthy dogs that are euthanized for reasons unrelated to this study. Total RNA in the liver tissue is extracted, labeled and hybridized to Affymetrix GeneChip® Canine Genome Arrays that interrogates 18,000 C. familiaris mRNA/EST-based transcripts and over 20,000 non-redundant predicted genes. Statistical analysis of microarray data will show a number of genes out of 18,000 genes, significantly differentially expressed in diseased Canine liver when compared to healthy one. Among these differentially expressed genes, those showing highest increase/decrease in expression profiles in disease condition can be selected as potential candidate genes for further study.

Arguments for the choice of the methodology

Linkage analysis and Association study are common methods to identify candidate disease genes. But they have limitations as linkage analysis requires large number of families and the result detects large genomic areas with very high number of genes that requires additional research. Association study uses information from literature focusing on genes previously linked with disease, which may lead to bias in the genes selected. So, for unbiased approach for candidate gene selection, microarray analysis can be used. It will measure expression of transcripts of very large number of genes and only small number of case and control samples is needed. As canine specific microarray chip is available commercially, this method would be cost effective and less laborious. RNA-sequencing approach would be more high-throughput method in this case, but due to its requirement of high cost and time, microarray analysis would be the best method of choice for this project.