Blood Group Based Snps For Forensic Utilities Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


DNA is genetic material in every cell of the body defined as blueprint of life. An unique nature of DNA provides access of its nature by adding fingerprints by means of molecular markers such as RFLP,STR,SNPs, and microsatellites. These are acting as mediators to the understand the relation of the genetic traits available in organisms of same species like human. As prospects of forensic research have started in early 1980's while DNA revolution has started(Jobling et al., 2004). Now the forensic science depends on DNA profling or DNA typing which identifies patterns in the DNA which was first reported in 1984 by Sir Alec Jeffreys at the University of Leicester in England(Jeffreys et al., 1984). These are involved with various methods of DNA typing like restriction fragment length polymorphism (RFLP) , short tandem repeat (STR), single nucleotide polymorphism (SNP) typing, mitochondrial DNA (mtDNA) analysis, human leukocyte antigen (HLA)-typing, gender typing, and Y-chromosome typing.

Single Nucleotide polymorphism(SNPs):

A single-nucleotide polymorphism (SNP) is a nucleotide sequence mutation occurring due to change in single nucleotide A,T, C, or G in the genome showing uniqueness of individual within a single species . Single nucleotides are due to substitution,deletions,insertion to a polynucleotide sequence. SNP are in coding sequences, non-coding regions, or intergenic regions between genes. An SNP which code same polypeptide sequence are termed synonymous or different polypeptide sequence are nonsynonymous. SNPs which are not present in protein-coding regions can contain consequences for gene splicing, transcription factor binding.

SNPs are therefore subdivided into four groups on the basis of their site of occurrence:

rSNPs (random SNPs):

Only about ten percent of our genome is made up of genes. The great majority of SNPs are

therefore located in what we currently view as 'silent' regions of our genome. These SNPs are extremely unlikely to have any perceivable effect on our phenotype, or constitution. In forensic science, many of them are used as markers in the mapping of genes within the genome.

gSNPs (gene-associated SNPs):

Many SNPs are situated alongside genes or in introns, the regions of a gene that do not code for a gene product, i.e. do not form part of the template for a protein. The fact that they are inherited with thesegenes makes gSNPs useful for the study of associations between the gene (and its variants) and certain phenotypes.Mapping gSNPs may be functionally relevant if they influence important control elements of the gene and thereby decrease or increase transcription of a gene.

cSNPs (coding SNPs): Exons are the coding regions of a gene,

i.e. the sequences of a gene that are translated into the gene

product - the protein. SNPs that are present in exons can

have a major influence on the function of the protein concerned if they result in incorporation of an alternative amino acid.

pSNPs (phenotype-relevant SNPs): Both gSNPs and cSNPs

can influence a person's phenotype: the former primarily via

the amount, and the latter usually via the form, of the protein for which the gene codes. pSNPs are the most important

type of SNP from the point of view of medicine. They form

one of the foundations of pharmacogenetics, the branch of

science concerned with the influence of gene variation on the effectiveness and tolerability of drugs.

Figure1: Showing various regions of SNPs blue arrows(rSNPs), yellow arrows(gSNPs), dark blue arrow(cSNPs) and Red arrow(pSNPs)

In this work we focus on SNP profiling which are around ~1.8 million

SNPs in the human genome which can used as SNP markers by 11 types of methods such as Genetic bit analysis (Nikiforov et al. 1994) showing its importance in forensic investigations.

(kashyap et al., 2004). A part from this analysis bioinformatics and forensic DNA are interdisciplinary solving different task by computational and statistical appraoches( Lucia and Pietro,2007). Rapid development of high-throughput SNP genotyping

platforms such as TaqMan or SNPlex, SNP-related tools and databases have become

increasing important for researcher. Some tools like SNPHunter, SNPper, and SNPicker and some databases like NCBI dbSNP,

SNP500Cancer, SeattleSNPs, HGVbase, GeneSNPs, and IIPGA.

Forensic DNA and Bioinformatics :

Forensic Science utilises the properties of DNA in several ways. The adage "every contact

leaves a trace" indicates the importance of a technique able to type trace amounts of genetic

material left during the commission of a crime. Hairs or saliva left on a balaclava worn

during a robbery, semen located at a rape scene, blood collected from an assault, perspiration

on clothing, traces of assailant's skin under a victim's fingernails, can often be DNA profiled.

This genetic information can then be used to include or exclude suspects as being the source

of the genetic material.It is not yet possible to test the whole of an individual's DNA. Forensic analysis involves the testing of regions of an individuals DNA.

The term Bioinformatics represents in the growing area of science which uses computational approaches along with molecular biology techniques to analyse biological data sets, encompasses the generation, collection, storing the data in digital form and exploitation of data through technologies such as transcriptomics, genomics etc. Statistical data is taken from experimental trails and scientific literature. The major use of bioinformatics is to use sequence information to annotate genes and also to identify the gene products (Baxevanis and Ouellette, 2004). The main principle involved in annotation is describing and exploring all the intermediate levels such as molecular and cellular processes.

Bioinformatics databases and tools have been advanced which list the

abundance of a particular fragment of DNA in the population. From this information, an

estimate of the abundance of combinations of DNA at several regions can be made and

compared to the DNA of victims or suspects. Statistical interpretation of the information can be made to estimate the likelihood of material

coming from a particular individual relative to coming from a random member of the

population. For examples, New Zealand, forensic DNA testing is carried out at the Institute of

Environmental Science and Research Ltd. (ESR), where a Bayesian approach to statistical

interpretation is used.

Literature Review:

In the forensic field the interest of SNP is continuously increasing. The reason is that SNPs have a number of characteristics that make them very appropriate for forensic studies: first, they have very low mutation rates and this is interesting for paternity testing, second they are very suitable for analysis using high throughput technologies and this interesting for data basing and for automation and specially they can be analyzed in short amplicons and, in general, short sizes are desirable since the size of the amplified product is critical for the successful analysis of degraded samples. For example, Preliminary data on the WTC identification shows the potential of this type of markers ( Brenner et al., 2003)and (Prinz et al., 2003).

Some advantages of SNPs according to kashyap et al., 2004 are These markers are numerous in mammalian genomes ,Multiple methods of SNP detection are available ,The amplification of alleles is not prone to and preferential amplification and robust multiplex amplification is relatively easy to achieve

Experimental Methods:


ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.

Figure2: Multiple sequence alignment by clustal W


SNPHunter is a software program that allows for both ad hoc-mode and batch-mode SNP

search, automatic SNP filtering, and retrieval of SNP data, including physical position,

function class, flanking sequences at user-defined lengths, and heterozygosity from NCBI

dbSNP (Wang et al. 2005). The SNP data extracted from dbSNP via SNPHunter can be

exported and saved in plain text format for further down-stream analyses.


Figure 3: Snapshot of SNPHunter main window


VizStruct is an interactive visualization technique that reduces multi-dimensional data to two dimensions using the complex-valued harmonics of the discrete Fourier transform (DFT). In the 3D VizStruct extension, the multi-dimensional SNP data vectors are reduced to three dimensions using a combination of the DFT and the Kullback-Leibler divergence. The performance of 3D VizStruct was challenged with several biologically relevant published datasets that included human Chromosome 21, the human lipoprotein lipase (LPL) gene locus and the multi-locus genotypes of coral populations. In every case, the 3D VizStruct mapping provided an intuitive visual description of the key characteristics of the underlying multi-dimensional genotype.

Url: Excel and MATLAB code are available at

Week Planner:

Week 1: Identify genes of blood group A,B and O from Genbank (

and annotate them with blood enzyme along with gene loci details.

Week 2: Prepare a data set of blood group based genes responsible to go for identify most consensus parts of the genes. and run in Multiple sequence alignment by clustal algorithm.

Week 3: Identify the most conserved parts from the dataset and make a testset of sequences with regions selected to use in SNPhunter.

Week 4: Identify right SNPs from the genes can be done by SNPHunter(Wang et al. 2005) and compared in Entrez SNP(Sherry et al., 2001) database to validate them by classifying into C/T or G/C types of alleles.

Week 5: Build a repository of SNPs with loci,gene and annotation details which is unique for blood groups and enzymes and make it available for querying.

Week 6: Validate the database and find errors and resolve issues with discussions.

Week 7:Analysis of the identified dataset by building graph of low allelic frequency genes available using VizStruct (Kavitha et al., 2006) .

Week 8: Interepret the results and the dataset.

Week 9: Discussion

Purpose of study:

To identify appropriate SNP's and demonstrate their low allele frequency variation sufficiently well for forensic purposes by screening the loci for genes traditionally used for serotyping to identify appropriate SNP's for determining ancestral origins.


Blood SNP database focusing the ideas of forensic laboratories and providing the loci,gene and map information through browsable frontend. Providing searchable interface to make to available for all needs of the forensic experts. Also providing an low allelic region identification in each gene along with OMIM and literature links to make it more flexible for researchers performing literature reviews.

M.A. Jobling and P. Gill, Encoded evidence: DNA in forensic analysis, Nat. Rev. Genet. 5 (2004), pp. 739-751

B. Budowle, T.R. Moretti, S.J. Niezgoda and B.L. Brown, CODIS and PCR-based short tandem repeat loci: law enforcement tools, Second European symposium on human identification, Promega Corporation, Madison, Wisconsin (1998) pp. 73-88.

Jeffreys A.J., Wilson V., Thein S.W. (1984). "Hypervariable 'minisatellite' regions in human DNA". Nature 314: 67-73. doi:10.1038/314067a0

V. K. Kashyap, Sitalaximi T., P. Chattopadhyay and R. Trivedi (2004) DNA Profiling Technologies in Forensic Analysis, Int J Hum Genet, 4(1): 11-30

Nikiforov TT, Rendle RB, Goelet P, Rogers YH, Kotewicz Genetic Bit Analysis: a solid phase method for typing single nucleotide polymorphisms. NucleicAcids Res, 22: 4167-4175.

Baxevanis, Andreas D. / Ouellette, B. F. Francis (eds.) Bioinformatics- A Practical Guide to the Analysis of Genes and Protein 3rd. Edition - November 2004(book)

C.H. Brenner and B.S. Weir, Issues and strategies in the DNA identification of World Trade Center victims, Theor. Popul. Biol. 63 (2003), pp. 173-178.

M. Prinz, T. Caragine and R.C. Shaler, DNA testing as the primary tool for the victim identification effort after the World Trade Center terrorist attack, Proceedings of the 20th Congress of the International Society of Forensic Genetics (2003).

Wang L, Liu S, Niu T, Xu X. SNPHunter: a bioinformatic software for single nucleotide

polymorphism data acquisition and management. BMC Bioinformatics. 2005 Mar 18;6(1

Riva A, Kohane IS. A SNP-centric database for the investigation of the human genome. BMC Bioinformatics. 2004 Mar 26;5(1):33.

Riva A, Kohane IS. SNPper: retrieval and analysis of human SNPs. Bioinformatics. 2002 Dec;18(12):1681-5.

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11

Kavitha Bhasi, Li Zhang, Daniel Brazeau1, Aidong Zhang and Murali Ramanathan VizStruct for visualization of genome-wide SNP analyses

,Vol. 22 no. 13 2006, pages 1569-1576