Homology cloning and multiple sequence alignment

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Homology cloning is an essential tool in structural proteomic that is used to identify the three-dimensional structure of a protein sequence based on an alignment to one or more 'known' sequences. Not only this, homology modelling can also be used to understand the function, activity and specificity of a certain protein. Multiple sequence alignment helps in homology cloning by establishing relationships between sets of different sequences to indicate regions of well conserved amino acid residues. These conserved regions then serve as templates for primer designs using which the particular gene can be screened for in a vast genomic library.

Using UNIPROT query for "flavodoxin desulfovibrio" 181 results were obtained. However, only the reviewed results were considered appropriate to be included in the selection of proteins. The search was narrowed down to name of the protein and size of polypeptide. This resulted in selection of seven polypeptides. Details of these polypeptides are mentioned in the table below.

Table Polypeptides chosen for multiple sequence alignment

Accession number


Length (aa)


Desulfovibrio vulgaris(strain Hildenborough / ATCC 29579 / NCIMB 8303)



Desulfovibrio vulgaris(strain Miyazaki F / DSM 19637



Desulfovibrio salexigens(strain ATCC 14822 / DSM 2638 / NCIB 8403 / VKM B-1763)



Desulfovibrio desulfuricans(strain ATCC 27774 / DSM 6949)



Desulfovibrio desulfuricans



Desulfovibrio gigas



Desulfovibrio gigas


Primary amino acid sequence of the polypeptides in FASTA format

>sp|P00323|FLAV_DESVH Flavodoxin OS=Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303) GN=DVU_2680 PE=1 SV=2




>sp|P71165|FLAV_DESVM Flavodoxin OS=Desulfovibrio vulgaris (strain Miyazaki F / DSM 19637) GN=DvMF_1143 PE=1 SV=2




>sp|P18086|FLAV_DESAD Flavodoxin OS=Desulfovibrio salexigens (strain ATCC 14822 / DSM 2638 / NCIB 8403 / VKM B-1763) GN=Desal_0805 PE=3 SV=1




>sp|P80312|FLAW_DESDA Flavodoxin OS=Desulfovibrio desulfuricans (strain ATCC 27774 / DSM 6949) GN=Ddes_1951 PE=1 SV=2




>sp|P26492|FLAV_DESDE Flavodoxin OS=Desulfovibrio desulfuricans PE=1 SV=1




>sp|Q01095|FLAV_DESGI Flavodoxin OS=Desulfovibrio gigas PE=3 SV=1




>sp|Q01096|FLAW_DESGI Flavodoxin OS=Desulfovibrio gigas PE=3 SV=1




Answer 2

Multiple sequence alignment was carried out using CLUSTAL WIX. Seven polypeptides obtained above were compared against each other to give output in both text and graphical format. Prior to that, these sequences were also compared against each other in a pairwise sequence alignment.

Table Pairwise sequence alignment of seven Desulfovibrio polypeptides

Figure Results of multiple sequence alignment carried out for seven polypeptides using CLUSTAL in text format. "*" means that the residues or nucleotides in that column are identical in all sequences in the alignment ":" means that conserved substitutions have been observed "." means that semi-conserved substitutions are observed.

Figure Graphical results obtained by the multiple sequence alignment of seven polypeptides using Clustal. Conservation of amino acids in each species is represented by the histogram below.

Answer 3

Second conserved region: Amino acids 46 to 74

First conserved region: Amino acid residues 5 to 19

Figure Three conserved regions of the seven polypeptides (modified from the graphical results obtained via Clustal). The two aromatic residues - Tryptophan-60 and tyrosine-98 - are indicated by the arrows.

Third conserved region: Amino acid residues 91 to 128

Three regions in the seven Flavodoxin polypeptide sequences were found with higher number of amino acids totally conserved in identical locations (See Figure 3). Most other amino acid residues are either totally conserved or replaced by amino acids that possess similar chemical properties. When compared to other locations of the peptide sequence, degree of conservation is greater and more consistent in the three regions. The histogram underneath the sequences further indicates the level of conservation among these polypeptides. Tryptophan-60 and tyrosine-98 hold the flavin ring of the FMN prosthetic group. These two aromatic amino acid residues are well-conserved in all the seven polypeptides. (See Figure 3)

Figure X-ray structure of Flavodoxin in Desulfovibrio indicating the location of different amino acids. The two aromatic residues tryptophan-60 and tyrosine-90 can be seen around the flavin ring of the FMN prosthetic group.

Answer 4

The two oligonucleotide primers should be designed to match with two highly conserved regions of Desulfovibrio flavodoxin gene. This is necessary to allow maximum hybridisation of primers to template DNA strands. It must also be ensured that the primers flank a region long enough to be amplified by PCR. Another important factor is, choosing amino acids that show the least degeneracy to minimise the number of possible primer sequences.

Figure Indicates the sites chosen for the two primers (forward and reverse) to hybridise with the polypeptides during PCR.

Considering all these factors, conserved regions 9-15 (GSTTGNT) and 66-71 (ELQDDF) were chosen for the forward and reverse primers respectively. These have been highlighted in the figure below.

Answer 5

Oligonucleotide primers can be designed using the chosen amino acid sequences. GSSTGNT (AA residues 9-15) and EMQDDF (AA residues 66-71) were chosen for the forward and reverse primers respectively. Using codon usage tables and reverse translation of the above mentioned peptides, degenerate sites were identified and the following nucleotide sequences were obtained.

Forward primer: GSTTGNT





G G G G G G (Degenerate sites)

Possible number of primers for the forward primer = 4*6*4*4*4*2*4 = 12288.

Therefore, the general sequence of forward primer is GGVWSVACVGGVGGVAAYACV


V - C, A, G

W - A, T

S - C, G

Y - C, T

Reverse primer: EMQDDF




All of the degenerate sites in the reverse primer show two-fold degeneracy. Therefore possible number of primers for reverse primers will be 2*2*2*2*2 = 32

General sequence for the reverse primer - GARATGCARGAYGAYTTY


R - A, G and Y - T, C

Calculation of melting point of primers

Minimum Tm for the forward primer is calculated using the forward primer sequence that has the minimum (C+G) percentage.

Therefore primer sequence GGTTCTTCTACTGGTAATACT, where (C+G) %= 38.1

Tm is calculated using the following formula:

Tm = 64.9°C + 41°C x (number of G's + C's in the primer - 16.4)/N

Where, N denotes the number of nucleotides in the sequence

Minimum Tm for the forward primer = 51.9⁰C

Maximum Tm for forward primer was calculated using sequence GGCTCCACCACCGGCAACACC where (C+G) % is highest (61%) = 68.9⁰C

For the reverse primer, minimum Tm= 43⁰C.

Maximum Tm calculated using the sequence for the reverse primer where C+C content is highest (55.6%).

Maximum Tm for the reverse primer = 53.2⁰C

Answer 6

A total of 63 amino acid residues are flanked by the two primers starting from position 9 to 71. As each amino acid residue codes for 3 bp (i.e. a codon), size of DNA fragment that will be amplified by the PCR reaction will be 63 X 3 or 189 bp.

The two primers will help in amplifying 189 bp of the Flavodoxin gene.


Answer 7

Homology refers to similarities due to a common evolutionary origin. A bat's wing and human arm are said to be homologous because they share a common structure derived from a common ancestor. As in many other cases, the similarity between homologous organs or sequences is not necessarily obvious.

Similar (analogous) organs or sequences on the other hand are not derived from a common ancestor (for example, a bat's wing and a butterfly's wing). Similarity does not mean that the two are homologous.

Answer 8

Codon usage tables contain all possible oligonucleotide sequences of amino acids. These are used in designing primers for homology cloning as it considers the degeneracy of genetic code and provides all possible oligonucleotide sequences. Codon usage tables are also species specific and hence are very essential when it comes to reverse translation of specific peptides to design degenerate primers.

Answer 9

DNA sequences are composed of both coding and non-coding regions and conservation of non-coding regions among species is much lesser. During translation of a DNA into protein, these non-coding regions are removed. As a result, a peptide of amino acid contains far much well conserved regions. Hence, it is more useful to carry out multiple sequence alignment using amino acid residues as opposed to DNA sequences to identify homologous sequences.

Answer 10

Degenerate primers are a mixture of similar primers (not identical) that are used to amplify the same gene from different organisms. These are also used when the primer design is based on a desired (target) protein sequence.

Answer 11

The following amino acid signature sequence of flavodoxin was identified using Prosite database.

[LIV] - [LIVFY] - [FY] - x - [ST] - {V} - x - [AGC] - x - T - {P} - x(2) - A - {L} - x - [LIV]

Percentage of false negatives = 100-85.71 = 14.29%

Percentage of false positives = 100-93.1 = 6.9%

Answer 12

Flavodoxins function as low-potential electron transport agents in the reduction of sulphite to sulphide in Desulfovibrio species. They can biologically substitute for ferrodoxins.