Dna Barcoding Uses Sequence Diversity Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Mt DNA has a fast mutation rate, which means it has a significant variation in the mt DNA sequences between species and a small variance within species. A 648-bp region of the cytochrome c oxidase I (COI) gene forms the primary barcode sequence for members of the animal kingdom (Hebert et al. 2003a; Savolainen et al. 2005). COI gene has two important advantages. First, the universal primers for this gene are very robust, second it appears to possess a greater range of phylogenetic signal than any other mitochondrial gene. The differences in COI amino-acid sequences are sufficient to enable the reliable assignment of organisms to higher taxonomic categories (Hebert et al. 2003). Moreover a COI database can be developed within 20 years for the 5-10 million animal species on the planet (Hammond 1992; Novotny et al. 2002).

In fact, since few taxonomists can critically identify more than 0.01% of the estimated 10-15 million species (Hammond 1992; Hawk- sworth & Kalin-Arroyo 1995), a community of 15 000 taxonomists will be required, in perpetuity, to identify life if our reliance on morphological diagnosis is to be sustained (Hebert et al. 2003).

In this study, we were supplied with DNA from eight anonymous species of fish, and a positive control, Onion Trevally (Carangoides caeruleopinnatus). We generated the raw data of DNA barcoding, and then we performed PCR reactions upon the fish samples, finally we tried to assign the identity of the species via Gen Bank and the Barcoding of Life Database (BOLD).

Figure 1. Query result using Barcoding of Life Database (BOLD) systems, on the first sample.

Methods and Materials

We performed PCR reactions upon extracted DNA from a range of fish species, to generate the raw material for DNA barcoding. We used eight DNA samples from unknown species of fish and one from a positive control, Carangoides caeruleopinnatus. The DNA was amplified from the 5' region of COI gene from the mitochondrial DNA using the following primers.



And ThermoScientific/Abgene's Reddymix PCR master mix, which contains water, magnesium chloride [1.5mM], PCR buffer, dNTPs and Taq polymerase. Table 1 shows the volumes and components for the master mix preparation and table 2 shows the PCR program we used.

Table 2. We ran the following programme in the PCR machine

The PCR products were visualized on 1% agarose gel, with 5% vol. ethidium bromide and run at ca. 70V for

Table 1. Components and volume for the PCR master mix.30 minutes and visualize and record via photo on the UV transilluminator.

Sequences of the PCR amplicons, were checked and edited using Chromas lite software. We assigned an identity of the species via the National Center for Biotechnology Information's GeBank and Barcoding of Life Database.


We expected a clear band in all the lanes from two to ten, and no band at all in the eleventh lane, which was the negative control. The result was one big band in lanes 4, 6 and 10, and a less defined band in lanes 7 and 9. The rest of the lanes did not shown a band, the results are shown in figure 2.

We performed a NCBI database search (BLAST) and simultaneously we used the Barcoding of Life Database (BOLD) search engine for each of our edited and checked sequences, the results are shown on table 3.

Figure 2. We used the image Lab Software from Bio-Rad to analyze and enhance the image, lanes 1 and 12 are the marker ladder, 10 represent the positive control and 11 the negative control.


Our PCR did not perform as expected, three big and clear bands, two weak bands and three lanes without a band as shows figure 1. There are several reasons for this variability in the results, first problems with the DNA extraction, or DNA poorly quantified or degraded. Some bands were clean and big whereas others were of a smaller size and lower intensity, this differences could be related with the amount of DNA or concentration in each sample.

The positive control on lane 10 works perfectly, it shows a big and intense band and the negative control on lane 11 was good as well, with no band at all as expected. We did not see any heterogeneity in the

Table 3. Results of NCBI database and BOLD search for fish sequences

Table 4. Results of fish sequences quality and amplification quality. results, only homogeneity.

We used the COI barcode and NCBI database and the Barcode of Life Data Systems (BOLD) for animal identification. Most of the genes sequenced cytochrome oxidase I, with the exception of 7-FISHF2, which was bacterial genome, chromosome I.

The results for sample 1-FISHF2 was the same for the both search engines and the identification of the specie was 100% accurate, with 2-FISHF2 we got a bad sequence that may not amplify very well, so we didn't got any taxonomic identification from BOLD, but we got a 94% from NCBI database recognized the fish specie P. platessa. How reliable this result is?, a good sequence with a 100% match identification on NCBI database and BOLD like 1-FISHF2 looks like figure 3, on the other hand, a bad sequence 2-FISHF2 figure 4. The result was correct, despite the poor quality of the sample.

In the third sample 6-FISHF2, we got two species from NCBI database, Myoxocephalus brandtii and Myoxocephalus stelleri, which according to BOLD, were less up to date in terms of sequence accuracy and taxonomic identification so we used instead the specie Myoxocephalus scorpioides, identified with a 99.8% of accuracy by BOLD. Figure 5 shows the BOLD TaxonID Tree, query sequence will be marked red on the tree with BOLD sequences in black, notice how close is the unknown specimen to Myoxocephalus scorpioides. Marked in blue are Myoxocephalus brandtii and stelleri.

In the fourth sample 7-FISHF2, we got a completely unexpected result, NCBI database identified a bacteria, Burkholderia pseudomallei instead of a fish and BOLD said that the sample did not match any records from the selected database, which is coherent, because we were using the animal identification database, that uses the COI gene instead of the bacterial chromosome I.

The fifth sample 8-FISHF2, was identify in a 100% from both search engines as Limanda limanda. The sixth sample 15-FISHF2, shown on figure 6, has dual picks of almost the same size in the sequence chromatogram, one possible explanation for the two target picks is contamination, two target DNAs.

The seventh sample, 16-FISHF2, was recognized by NCBI database as a Spicara maena, with a 91% of identity accuracy. On the other hand, BOLD identified a Spondyliosoma cantharus from the the family Sparidae, with a 100% of accuracy. Spicara maena from the family Centracanthidae was not the correct match, according to BOLD, and we can see in the dendrogram on figure 6, how close is the branch of the unknown specie to spondylios

Figure 4. Sequence chromatogram of 2-FISHF2.oma cantharus, and at the same time how far to spicara maena.

Figure 7. Tree result of 16-FISHF2 in red and Spondyliosoma cantharus in green, notice haw close are the two branches. In blue Spicara maena in a distant branch (Barcoding of Life Database).

The eighth sample 18-FISHF2, was correctly identified as a merluccius merluccius in a 99% by NCBI database and by a 100% by BOLD.

The last sample 21-FISHF2, was our positive control, a known fish from the specie carangoides caeruleopinnatus. Our sequence was wrongly identified in a 99% by BLAST and in a 99.5% by BOLD as a carangoides malabaricus. Possible reason the query sequence was too short to be accurately compare with the database, but sample 18-FISHF2 had a query length of 622, shorter than sample 21-FISHF2 with a query length of 650, and was correctly identified, therefore there was not a problem with the length.

Also some regions with low complexity sequence have an unusual composition that can create problems in sequence similarity while searching, but this is not the case either.

i.e. low coplexity: AAATAAAAAAAATAAAAA

It is also possible that our species of interest has not yet had any sequences submitted to BOLD, and that can cause a misidentification problem.  The ID engine uses all sequences uploaded to BOLD from public, as well as private projects to locate the closest match.

Figure 3. Sequence chromatogram of 1-FISHF2.The name of each of the unknown species were given at the end of the practical, so we can check how accurate our matches were. The results are summarize on table 5.

Figure 6. Sequence chromatogram of 15-FISHF2, the arrows shows two picks that are almost the same size

Conclusion, if we avoid contamination and provide samples in a good concentration of DNA, we can expected god amplifications and sequences to work with in the animal identification. Misidentifications at the species level were undoubtedly a consequence of the limited size and diversity of our species profile. Animal species and its sequences are divergent enough to enable recognition of all but the youngest species.

In our study BLAST shows a higher sensibility while BOLD a better specificity, both are good search platforms. But BOLD goes a little further, providing a detailed information about the species, fig 8 and fig 9. Specimen records, with sequences or barcodes, public records, as well as collection sites, with a map and ranking, also the source of the samples and photos for visual taxonomy.