Multiple Sequence Alignment And Primer Design Of Flavodoxin Biology Essay


The aim of this report was to design oligonucleotide primers for a Flavodoxin protein from an unknown Desulfovibrio species. Flavodoxin protein sequences were found using UniProt database and a multiple sequence alignment was performed using Clustal. Oligonucleotide primers were designed to incorporate two highly conserved regions at amino acid region 57 and 98.

Flavodoxin is important in carrying out redox reactions within the Desulfovibrio bacteria.

Multiple Sequence Alignment and Primer Design


To design a set of primers capable of cloning a specific gene, in this case a Desulfovibrio gene that codes for a protein called flavodoxin, it is first essential to select available protein sequences from different Desulfovibrio species and then perform a multiple sequence alignment.

An appropriate database was used to locate available Desulfovibrio flavodoxin protein sequences, in this case UniProt KB. A search was performed on Desulfovibrio flavodoxin, which produces a list of flavodoxin sequences. The first entry was run in a programme located on the same database called BLASTp. The BLAST search found other Desulfovibrio flavodoxin protein sequences similar to the one of interest. Sequences with positive alignments and low E Values (<0.001) were selected as shown (UniprotKB, 2010).

Lady using a tablet
Lady using a tablet


Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

The following FASTA sequences were downloaded from UniProt and saved as a Windows Notepad text format:




>sp|P71165| Desulfovibrio vulgaris (strain Miyazaki F / DSM 19637)



>sp|P80312| Desulfovibrio desulfuricans (strain ATCC 27774 / DSM 6949)




>sp|P18086| Desulfovibrio salexigens




>sp|Q01095| Desulfovibrio gigas




>sp|P26492| Desulfovibrio desulfuricans




>sp|Q01096| Desulfovibrio gigas





The seven FASTA sequences were used to perform the multiple sequence alignment. The sequences were copied into a multiple sequence alignment programme called ClustalW2.

Figure1. Results of the ClustalW2 multiple sequence alignment (ClustalW2, 2010).


The ClustalW2 alignment displays symbols that indicate regions of the aligned sequences, which are identical "*", conserved substitutions ":" and semi conserved substitutions ".". Figure 1 shows regions of the flavodoxin protein aligned sequence that are identical and conserved. There are also other regions, which are non-identical and semi conserved (ClustalW2, 2010).

Three highly conserved regions were chosen (see figure 3). Highly conserved regions consist of both identical and well conserved alignments, with very few to zero semi conserved and non identical alignments. The highly conserved regions illustrate sequence motifs of the protein, which contribute to specific biochemical functions. The sequence motifs are an example of homology, meaning that they derived from a common ancestor.

Figure3. Flavodoxin protein multiple sequence alignments and the three highly conserved regions.

Flavodoxin is a low molecular weight (16kDa) electron transfer protein. Flavodoxin contains a non-covalently bonded flavin mononucleotide (FMN) prosthetic group. FMN consists of an isoalloxazine ring, which is involved in the binding of flavodoxin apoprotein and subsequent redox reactions (UniprotKB, 2010).

Helms et al. identified that two aromatic residues Tryptophan and Tyrosine flank the flavin isoalloxazine ring of the flavodoxin protein. The two aromatic residues are located at amino acids 60 and 98 respectively.


As mentioned earlier, the multiple sequence alignment identified three regions that were highly conserved. The aromatic amino acids Tryptophan 60' and Tyrosine 98', which flank the isoalloxazine ring, are located within the conserved regions of 2 and 3 respectively. The isoalloxazine ring is crucial to the biochemical functions of the protein and so would have to be included in the cloning process. Therefore the conserved regions containing amino acids 60 and 98 must be incorporated into the design of the oligonucleotide primers.

Flavin isoalloxazine ring

Tryptophan 60

Tyrosine 98

Figure 4. Structure of Flavodoxin protein showing the Flavin ring flanked by Tryptophan 60 and Tyrosine 98. Modified from RCSB Protein Data Bank, available at


Degenerate primers are chosen to clone the flavodoxin gene rather than using nucleotide primers, as only the protein sequence is known in this case.

Primers were selected from the conserved regions of 2 and 3 and included amino acids 60 and 98.

Lady using a tablet
Lady using a tablet


Writing Services

Lady Using Tablet

Always on Time

Marked to Standard

Order Now

First the amino acid sequence of the chosen region is converted into a nucleotide sequence. There are a number of web based programmes, which can reverse translate the amino acid sequence of interest. A programme called Molecular Toolkit was chosen.

The amino acid sequence of the forward primer was selected from conserved region 2. Sequence "CSTWG" ranging from amino acids 57 to 61 and so incorporates Tryptophan 60.

Amino acid sequence: 57C S T W G61


Nucleotide sequence: 5' TGN RWY ACY TGG GGY3'

Forward Primer: 5'ACS RWY TGY ACC CCY3'

The forward primer is complementary to the nucleotide sequence.

N = T or C

Y = T, C, A or G

R = T or A

W = C or G

S = A or G

A protein sequence can result in a number of different nucleotide possibilities (shown above). The reason for this is that a large proportion of amino acids are coded by one or more codons. Serine (s) for example is coded by up to six different codons and Tryptophan (W) by only one codon.

The degeneracy of a primer is a theoretical number of potential nucleotide sequences and can be calculated as follows:


W + M


F + Y + H + Q +N + K + D + E + C




V + P + T + A + G


L + S + R

The degeneracy of the protein sequence CSTWG: 2Ã-6Ã-4Ã-1Ã-4 = 192

There are 192 sequence possibilities that the primer can anneal to.

The complexity of the forward primer can be measured by calculating the maximum and minimum melting temperatures (Tm).

Equation = Tm = 2(A + T) + 4(G + C)

Maximum Tm = 44°Ϲ

Minimum Tm = 52°Ϲ

The amino acid sequence of the reverse primer was selected from conserved region 2. Sequence "YTYFCG" ranging from amino acids 98 to 103 and so incorporates Tyrosine 98.

Amino acid sequence: 98Y T Y F C G103


Nucleotide sequence: 3'TAN ACY TAN TTN TGN GGY5'

Reverse Primer: 5'YCC SCA SAA STA SGT STA3'

The reverse primer is complementary to the nucleotide sequence but in the 5' - 3' direction.

N = T or C

Y = T, C, A or G

R = T or A

W = C or G

S = A or G

The degeneracy of the protein sequence YTYFCG: 2Ã-4Ã-2Ã-2Ã-2Ã-4 = 256

There are 256 sequence possibilities that the primer can anneal to.


The size of the DNA fragment amplified by the primers is calculated as follows:

Forward primer ends at amino acid 61

Reverse primer ends at amino acid 98

98-61 = 37 amino acids

Three bases (triplet codon) to one amino acid therefore: 37 Ã- 3 = 111 bases.

The size of the DNA fragment will be 111 bases.

The DNA fragment can be analysed by running the product of the polymerase chain reaction (PCR) on an electrophoresis gel. The product is added alongside an appropriate DNA ladder, which will allow the size of the product to be determined after staining. Alternatively a real - time PCR could be undertaken, which is more sensitive.


Sequence homology is the process whereby genes and proteins have derived from a common ancestor. Sequence similarity refers to the probability of match between nucleotides or amino acids in a sequence and may have occurred by chance. Sequence similarity however doesn't mean the sequences have derived from a common ancestor. Sequence homology of two genes or proteins may also share very little similarity.

Select regions of genes and proteins from sequence homology are crucial to the biochemical functions of the organism.


Codon usage tables identify the number of different triplet codons that code for specific amino acids. There are a total of 64 different codon combinations coding for 20 amino acids. This means that a number of amino acids are coded by more than one triplet codons e.g. Serine (s) is coded by up to six different codons. Codon usage tables can be obtained from databases, selected for a specific organism. The codon usage table in this case provides the probability of a particular triplet codon appearing in the sequence of interest. Codon usage tables were required in converting the protein sequence into a nucleotide sequence.

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work


It is more useful to carry out a multiple sequence alignment of a protein sequence than a nucleotide sequence for a number of reasons. Proteins sequences are more specific compared to nucleotide sequences and do not contain mutations or introns. A small part of a protein can be amplified to create the coding DNA, which may be difficult to clone

via nucleotide primers. Conserved regions can also be used to identify gene families or homologous genes.


Degenerate primers are designed from a protein sequence consisting of amino acids rather than a DNA sequence. The primer sequence is degenerate due to the positions being occupied by one or more different nucleotides.


Description: Flavodoxin

Signature: Pattern[LIV]-[LIVFY]-[FY]-x-[ST]-{V}-x-[AGC]-x-T-{P}-x(2)-A-{L}-x-[LIV].

Total number of hits in UniProtKB/Swiss-Prot: 58 hits in 58 different sequences.

False positives % = 4/58 Ã- 100 = 6.8%

False negatives % = 9/58 Ã- 100 = 15.5%


Flavodoxin within Desulfovibrio species is involved in electron transfer and redox reactions. In Desulfovibrio flavodoxin can replace the protein ferredoxin in sulfur and sulfite metabolism (Hrovat et al.).