This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Next generation sequencing technologies have become powerful tools for generating cross-genome sequence data in non-model species at relatively low cost Bertozzi 2012; Thomson et al. 2010 with broad applications from population studies to phylogenetic comparisons [Bertozzi 2012; Rasmussen and Noor 2009]. In particularly, 454 FLX (Roche) pyrosequencing can be used to generate large sequence datasets of expressed sequence tags (ESTs) and randomly fragmented genomic DNA (genome survey sequence or GSS) which has contributed to a wealth of new genetic markers being developed including SSRs [Rasmussen and Noor 2009]. Partial GSSs have provided useful information on gene content and functional and repetitive elements [Bertozzi 2012; Rasmussen and Noor 2009]. Bouck et al  suggested that identification of putative sequences in a genome can be revealed by low genomic coverage sequence analysis as has been reported for dog [Kirkness et al. 2003] and pig genomes [Wernersson et al. 2005].
Identification of putative genes using a GSS approach is now considered to be the optimum strategy and is more efficient and productive than EST profiling since it can cover all developmental life-history stages and avoids redundancy resulting from multiple copies of mRNA sequences from extremely high expressed genes [Strong and Nelson, 2000]. Local alignment algorithms for example, BLASTx are used widely to identify putative gene regions, but do not identify exact mRNA splice boundaries (complete gene structure) while similarity based approaches have been used to identify coding regions in mouse, human and bovine sequences [Chureau et al. 2002]. Therefore, while this approach may be problematic for studies of short GSS sequences, it may be applicable for long GSS sequences of significant similarity and allow prediction of putative gene activity and for potentially recognizing exon-intron boundaries.
In the middle and lower regions of the Mekong River Basin (MRB), the mudcarp (Henicorhynchus lobatus) is one of the most abundant fish species comprising approximately 21% of the total inland fisheries catchment in Cambodia [Baird, 2011]. As a consequence this species is considered to be one of the most valuable fish resources in this region [Baran, 2005]. While only small in size, this species is important because it contributes significantly to food security and nutrition for more than 60 million local people [Baird et al. 2003]. Currently, little is known however about the biology of H. lobatus or for that matter for most other valuable freshwater fishes in this region, but observations of fish movement patterns suggest that this species migrates extensively across the Mekong basin during the wet season [Baran 2006]. Effective management of wild fish stocks in MRB however, needs to be based on fundamental scientific data about the scale of interaction or independence of natural populations and molecular genetic approaches offer a strategy for addressing this knowledge gap. The use of molecular data to infer the natural scale of individual dispersal, wild population connectivity and discrete management units is a relatively new field of research in Asia but potentially can address many problems associated with fish stock identification, as has been evidenced by comparable studies in Europe and the United States [Gum et al. 2009; Tiano et al. 2007; Youngson et al. 2003]. A recent molecular analysis of wild population diversity using an mtDNA marker in this species revealed that a single stock was present in the Mekong River mainstream while for a close relative (H. siamensis) a congener very difficult to distinguish morphologically from H. lobatus, 3 distinct stocks were evident across the same geographical area [Adamson et.al. 2009]. Furthermore, a highly divergent stock of H. lobatus was also detected in tributary of the Mekong (Mun River), while H. siamensis constituted only a single stock in this area [Hurwood et al. 2008]. Since however, patterns of mtDNA variation only inform about maternal gene flow and essentially constitute a single locus that may not necessarily reflect general impacts of selection and drift across the genome, as such, these data are probably insufficient by themselves to thoroughly distribute fine scale wild population structure in either mudcarp species [Zink and Barrowclough, 2008; Forister et al. 2008]. Here we screened anonymous putative genes in a partial genomic DNA library from H. lobatus using a 454-FLX pyrosequencing approach to develop SSR markers for this species and tested out the markers designed via cross-species amplification in the closely-related congener, H. siamensis. While we only targeted part of the total genome of H. lobatus here, data generated can provide basic knowledge about genome architecture and diversity in H. lobatus that have applications from a management perspective.
Samples and DNA Extraction
Morphological identification of the target species as undertaken in the field and fresh fin tissue taken and stored in 95% ethanol in labelled vials. DNA was extracted using a modification of salt extraction method [Miller et al. 1998]. Verification of Henicorhynchus species was justified via mtDNA sequencing [Hurwood et al. 2008].
Library Construction and 454 Pyrosequencing
454 pyrosequencinanalysis was performed by the Australian Genome Research Facility (AGRF), Brisbane, Australia. Sample gDNA for this analysis was taken from 2 pooled individuals. Quantification of random fragmented DNA after library construction was performed with a Quant-iT RiboGreen fluorometer (Invitrogen, Mulgrave, Australia) and to determine average fragment size, 1 ¿½l sample aliquots were analysed in a Bioanalyzer (Agilent, Mulgrave, Australia). Sequencing of gDNA was run on an eighth of a pico-titer plate of 454 GS-FLX with pyrosequencing chemistry (Roche, Branford, CT, USA) following the manufacturer's protocol.
Sequence cleaning and assembly
Initially all sff sequence files after 454 GS-FLX sequencing were processed using Roche quality filtering programs tools to remove adapter A and B sequences, poor quality sequence and barcodes. Subsequntly, sequences with >60% homopolymers (single nucleotide) which the homopolymers length were >100 nucleotides were excluded. With default parameters in Roche-Newbler 2.5.3, modified (trimmed) sequences were assembled de novo to produce singleton and contig datasets for next analysis. SNP identification was not performed here, due to inadequate sample number assigned to 454 pyrosequencing. We submitted all GSS H. lobatus sequences produced here to the NCBI Sequence Read Archive with accession number SRA 053106.
Identification of putative gene functions using BLASTx searches [Altschul et al. 1997] of the GenBank database (non-redundant/nr), National Center for Biotechnology Information (NCBI) were conducted on all GSSs (singletons and contigs), applying an E-value threshold of <1e-5. Individual GSSs sequences function was predicted using The Blast2GO software suite [G¿½tz et al. 2008]. Gene Ontology terms (The Gene Ontology Consortium 2008) and metabolic pathways were assigned using Kyoto Encyclopaedia of Genes and Genome (KEGG) [Kanehisa et al. 2006]. The InterProScan tool were utilized to identify H. lobatus GSS protein domains against the InterPro databases [Hunter et al. 2009]. WEGO software [Ye et al. 2006] was used to quantify (visualizing, comparing and plotting) GO annotation results from contigs.
Identification of GSS-SSR motifs
Msatcommander was used to identify SSR motifs within H. lobatus GSS sequences [Faircloth 2008]. Perfect SSR motifs (di, tri, tetra, penta, hexa and compound) were detected with the default setting; a minimum eight repeats required for dinucleotide motif and a minimum six repeats for other SSR repeats types. To be assigned as a compound SSR motif, 100 nucleotides was the maximum interruption setting assigned between two neighbouring SSRs. PCR primers generated in flanking regions of identified unique SSRs were designed using Perl script modules linked to Primer3 software [Rozen and Skaletsky 2000].
Microsatellite screening, amplification, testing and cross species amplification
Nonspecific SSR marker amplification products can be avoided by developing primers in flanking regions not grouped in the same sequence. Therefore, MicroFamily software [Megl¿½cz 2007] was used to examine the likelihood of groupings of all GSS have content SSR motifs. All primers (25 primers) tested here originated only from unique sequence. Ten individual samples from each of five discrete wild H. lobatus populations collected from the MRB were utilized for preliminary PCR amplification tests, and from this, eight selected loci were utilized for futher analysis. PCR amplification were performed in a total volume of 12.5 ¿½l and PCR reactions contained 1.5 ¿½l 5x MyTaq Red Buffer (Bioline (Aust) Pty. Ltd Australia), 0.05 ¿½l MyTaq DNA Polymerase (Bioline (Aust) Pty. Ltd Australia), 0.4 ¿½l (10pmol) of primer (forward and reverse), 1 ¿½l of DNA template and ddH2O up to 12.5 ¿½l. PCR conditions were 5 minutes initial denaturation at 94oC, proceeded by 30 cycles comprising of 30 seconds at 94oC, 15 seconds at 54-57oC and 15 seconds at 72oC, then 5 minutes of final extension at 72oC and 15 minutes of 15oC. PCR products were multiplexed (1 ¿½l of each locus and 1 ¿½l of GeneScan¿½ 600 LIZ¿½ Size Standard v2.0.) then analyzed in a sequencer (ABI-3500) for genotyping.
GeneMapper software (Version 4.1; Applied Biosystems: Mulgrave, Australia, 2011) was employed for allele scoring. Polymorphic loci were examined in 50 H. lobatus individuals taken from 2 distantly located sites in the MRB, Nongkai (NK) in Thailand (17o30'N; 102o18'E) and the Bassac river (BN) in Vietnam (11o28'N; 104o57'E). In addition, H. lobatus samples were also tested for cross-species amplification with SSR markers developed in a previous study for the closely related cogener, H. siamensis [Iranawati et al. 2012]. Cross species amplification trials of H. siamensis individuals with SSR markers designed for H. lobatus were undertaken on 46 individuals from Battambang (BB) and Ubon Ratchathani (UB) that had been genotyped successfully with H. siamensis specific loci in a previous study [Iranawati et al. 2012]. The errors in genotyping were examined applying a 95% level of confidence with MicroChecker software [Van Oosterhout et al. 2004] and statistical analyses (number of allele (Na) and percentage of missing data) were performed using GenAlex6 [Peakall and Smouse, 2006], Hardy-Weinberg value (PHWE), observed (Ho) and expected (He) heterozygosity were calculated using Arlequin v3.0 software [Excoffier et al. 2005] and Polymorphism information content (PIC) were obtained using Excel-microsatellite-toolkit v3.1 [Park 2001]. General Linear Model (GML) analysis using SPSS v.19 software was performed to compare the number of alleles (Na) between loci and between species.
Results and Discussion
All sequence reads that meet the basic qualification standards were grouped followed by de novo assembling. A total of 81,292 GSS sequences from gDNA fin fish tissue were generated by the 454-FLX pyrosequencing technique. While high qualified assembly of GSS generated 1172 contigs with average length 344 nt (Table 1), unassembled GSS (singleton) were the majority of the GSS sequence (total = 14.36 Mb). Similar studies in Cichlid fishes [Elmer et al. 2010] and European hake [Milano et al. 2011]) reported lower average sequence read length (202 and 206 nt, respectively) than in H. lobatus (260 nt), while longer average sequence read length was reported in channel catfish [Jiang et al. 2011]) and bream [Wang et al. 2012] (292 and 367 nt, respectively). Results of the current study however, showed average length of GSS sequences were similar with those of a previous gDNA study using 454 for H. siamensis (264 nt [Iranawati et al. 2012]. As Figure 1 shows, contigs length sequences of H. lobatus were vary from 100 nt to 3,286 nt, with 16.8% (197) were more than 500 nt in length, while singletons length vary between 50 nt to 673 nt (average 268 nt). The difference in the average length of GSS read likely resulted from differences in the total raw reads number and object materials used for sequencing (mRNA vs. gDNA). This GSS study is the pioneer genomic study in H. lobatus to our best knowledge.
Comparative analysis of GSSs
Result of BLASTx searches showed that 136 of the 1,172 (11.6%) contigs and 4,358 of the 55,219 (7.9%) singletons of H. lobatus GSS significantly similar with the GenBank protein database (nr; non-redundant) (Table S1). Figure 2a,b showed that H. lobatus GSS (64%from contigs and 86% from singletons ) substantially matched well with fish sequences database (E value <1e-5), a result that consistent with previous studies studies [Coppe et al. 2010; Elmer et al. 2010; Salem et al. 2010; Panhuis et al. 2011; Iranawati et al. 2012]. The top matches species were zebra fish (Danio rerio), followed by Nile tilapia (Oreochromis niloticus) and spotted green puffer (Tetraodon nigroviridis) suggested that in general, the relationship of H. lobatus with other fish were phylogenetically close. More sequence database were available for zebra fish (Danio rerio) than other fish species such common carp (Cyprinus carpio), therefore the similarity most likely results from this indication as was indicated earlier in the Siamensis mud carp study [Iranawati et al. 2012]. While as in many non-model species, the availability of H. lobatus sequences in NCBI databases were low, the lobatus GSS sequences established here will highly promote the identified genes in this carp species. As expected for uncharacterized sequences studies, a high number of H. lobatus GSSs did not matched with any coding sequences in the GenBank database [Elmer et al. 2010; Jiang et al. 2011; Jung et al. 2011; Panhuis et al. 2011]. As suggested from previous studies [Wang et al. 2004; Mittapalli et al. 2010; Jung et al. 2011], while the predominant of anonymous EST/GSS sequences is likely expand in non-coding region in the genome or arise from homopolymer assembling error from 454 sequencing runs, some however may compose of novel and unique genes to target species and await further study.
Gene Ontology assignments
Referring to BLAST search 56,391 H. lobatus GSS (1,172 from contigs and 55,219 from singletons) could be identified as known function proteins (Table S1). As Figure 3 shows, of coding sequences in H. lobatus GSS; 2,759 sequences were attributed to biological function, while 1,230 sequences and 1,513 sequences were attributed to cellular components, and molecular function, respectively. As for H. siamensis [Iranawati et al. 2012], among GSSs that attributed to biological function, many were assumed to be related in metabolic or celluar processes. Cellular component feature suggested many GSS sequences were expected to occupy cell and be part of cell function, while many GSS attributed to molecular function were related with catalytic or binding functions, essentially either molecular transducers or enzyme regulators. Previous sequences analyses studies in fish species also revealed similar transcripts with inherent metabolic functions [Coppe et al. 2010; Salem et al. 2010; Panhuis et al. 2011; Iranawati et al. 2012].
KEGG pathways identified many coding sequences that exist in the H. lobatus GSS dataset (singleton and contig) (Table S1); purine metabolism (n=69), drug metabolism (n=36), oxidative phosphorylation (n=18), phosphatidyl inositol signalling system (n=18), starch and sucrose metabolism (n=17), steroid hormone biosynthesis (n=17), inositol phosphate metabolism (n=15), and retinol metabolism (n=14). Identification of a large number of transcripts for purine metabolism, oxidative phosphorylation, phosphatidylinositol signalling system in H. lobatus was a similar result as that recorded in the previous GSS study of H. siamensis where their physiological effects were well described [Iranawati et al. 2012].
Of interest, we identified a large number of GSS that compatible to drug metabolism that are potentially involved in determining both efficacy and residence time of drugs in the body as well as in modulating body response to toxic chemicals [Ardag Akdogan & Sen 2010]. Understanding more about the expression and activity of drug-metabolism enzymes, influenced potentially by both internal and external factors [Gibson & Skett 1994], can inform about the acute phase response in fish (i.e. disease resistance) to physical, microbiological or parasitological agents [Monshouwer & Witkamp 2000]. We also identified a large number of GSS sequences apparently related to starch and sucrose metabolism (SSM). Genes with these functions play a important role in using digestible carbohydrate to support protein or lipid deposition (Capilla et al. 2003; Kumar et al. 2009).
For growth associated studies in fish, identification and understanding of genes involved with efficient absorption of dietary glucose across the gut or that affect metabolic use of glucose by fish fed carbohydrates [Panserat et al. 2009; Polakof et al. 2012], could provide important information for stock improvement and nutrition studies. In addition, of H. lobatus transcripts, a total of twelve presumed to be engaged in the steroid hormone biosynthesis pathway. Steroid hormones, derived from cholesterol with similar tetra-cyclic structures, play important roles in controlling reproduction, individual development, and/or organism homeostasis in both vertebrate and invertebrate species [Hsu et al. 2006; Lafont & Mathieu 2007; Lyche et al. 2010]. A few transcripts in H. lobatus had suggested putative roles in retinol metabolism. Retinoids with retinol mainly obtained from dietary sources, play an essential role in many physiological processes, including embryonic development, reproduction, postnatal growth, differentiation and maintenance of various epithelia, immune responses, and vision. [Lid¿½n & Eriksson 2006; Levi et al. 2008; Gesto et al. 2012]. Overall, putative genes identified in H. lobatus offer insights into metabolic molecular process (responses and actions) in this species, yet not many of essential genes in putative KEGG pathways were recognized in H. lobatus GSSs.
A total of 4,614 protein domains from the H. lobatus GSSs (Table S1) were identify using InterProScan searches. As were reported in similar study in other teleosts species [Salem et al. 2010; Coppe et al. 2010; Iranawati et al. 2012], the predominant domains (Table 2) were reverse transcriptase, zinc finger domains, protein kinase catalytic domains and Integrase catalytic core. Their physiological roles were addressed in our previous study [Iranawati et al. 2012].
Epidermal growth factor (EGF) which belongs to the same family of growth factors as transforming growth factor ? (TGF ?) is regarded as one of the most extensively studied growth factors (Derynck 1990). While EGFs and their receptors play important roles as regulators of cell proliferation differentiation, survival, growth, motility, and apoptosis (Olayioye et al. 2000; Yarden & Sliwkowski 2001), recently they have received increasing attention in teleost studies due to identified roles in mediating luteinizing hormone signalling within the follicle, leading to final oocyte maturation and ovulation [Park et al. 2004; Conti et al. 2006; Tse & Ge 2010; Hsieh et al. 2011]. Among the H. lobatus sequences, EGF-type aspartate/asparagines (54), EGF-receptor, L domain (22), and EGF-like, type 3 (15) were identified.
Thirty three domains containing Ras-association (RA) genes were also recognized in the H. lobatus GSS sequences. Eight members of the RA domain as defined by sequence homologies between Ras effectors, have been identified to date [van der Weyden and Adams 2007]. While Ras proteins are known to play a direct causal role in human cancer, fish Ras genes encode proteins that have a high degree of nucleotide sequence and deduced amino acid similarity with mammalian Ras gene counterparts. They are thought to play a central role in cell growth signalling cascades [Rotchell et al. 2001]. In addition, studies of Ras mutations have provided new opportunities for researchers in molecular ecotoxicology [Rotchell et al. 2001; Liu et al. 2003].
Analysis of genes
The most abundant sequence among H. lobatus GSSs were homolog to retrotransposable element tf2, transposable element tc1, novel protein, orf2-encoded protein, transposable element tcb1, transposase, and pol-like protein, a result that is consistent with data from close related species, H. siamensis [Iranawati et al. 2012] and their physiological roles are well documented [Iranawati et al. 2012]. Long interspersed nuclear elements (LINEs) are retrotransposon members that proliferate via an RNA intermediate [Malik & Eickbush 1999]. LINEs, identified in a wide variety of eukaryotic genomes [Arkhipova & Meselson 2000], encode two proteins: ORF 1p is a gag-like protein with RNA-binding and nucleic acid chaperon activities [Hohjoh & Singer 1996; Kolosha & Martin 1997; Martin & Bushman 2001; Martin et al. 2005], while ORF 2p is a pol-like protein with endodeoxyribonuclease (EN) and reverse transcriptase activities [Mathias et al. 1991; Feng et al. 1996]. Although the mechanisms for these process are still unclear, it has been proposed that studies on the de novo insertion events of genetically marked LINEs could provide significant contributions to developing a better understanding of retrotransposition mechanisms [Moran et al. 1996; Moran & Gilber 2002]. In principle, the abundance of transposable elements in many eukaryotic genomes allows relatively large-scale, genome-wide analysis of genomic LINE copies using this method [Ichiyanagi & Okada 2006], so it would be informative to explore further because the majority of genes recognized in this study are related to a reverse transcriptase domain. While the present study mainly focused on identifying putative GSS-SSR in the H. lobatus, putative gene transcripts identified here present fundamental information for advance analysis. In species where only limited genetic data are available such as H. lobatus and H. siamensis, GSSs can provide invaluable information as a base line to understanding potential roles of modern genes in each tissue type.
Putative Microsatellite Markers
A total 2,047 sequences containing microsatellites motif including 17.49% tetra/penta/hextanucleotide repeats, 74.65% dinucleotide repeats and 7.87% trinucleotide repeats were found (Figure 4) in the H. lobatus partial GSS sequences. Substantial number of a dinucleotide repeats recognized in H. lobatus is accordant to previous sequence analyses of teleosts species and other aquatic animals [Jung et al. 2011; Wang et al. 2012; Iranawati et al. 2012]. A total of 495 SSR primers (Table S2) consist of 8.68% tetra/penta/hextanucleotide repeat primers, 7.88% trinucleotide repeat primers and 77.98% dinucleotide repeat primers, were developed successfully from this number (2,047). While the primers developed in H. lobatus predominantly from singletons sequences, only little number primers were developed succesfully from H. lobatus contigs sequences, suggested that the gDNA of H. lobatus analysed here for GSSs were highly polymorphic non-coding regions or possibly homopolymer issue with 454 pyrosequencing, a result consistent with a previous GSSs study of H. siamensis [Iranawati et al. 2012].
While a significant number of primer were designed (Table S2) from GSS with identified microsatellite motif, further verification for these primers as genetic makers for ecology and evolution studies in H. lobatus is needed as in other non-model organism [eg. Panhuis et al. 2011; Wang et al. 2012; Iranawati et al. 2012]. Furthermore, SSRs marker identified here may have potential transferability to other close related species [Ellis and Burke 2007; Zheng et al. 2010; Ma et al. 2011; Wang et al. 2012].
SSR Markers Test and Cross-Species Amplification
From the 454 pyrosequencing run, 495 sequences qualified for marker design from 2047 GSS containing microsatellite motif repeats in H. lobatus. The distribution of GSS-SSR motifs indicated that repeat frequency declined exponentially with repeat length (Table S2). This may be as a result of the higher mutability rate of longer repeats compared with short repeats [Katti et al. 2001] and the likelihood that longer repeats will mutate to smaller repeats [Ellegren 2000]. While CAG and GATA repeat motifs are the dominant SSR repeat types reported in vertebrates, dinucelotide repeats (CA) are the most prevalent SSR marker type utilized for studying genetic diversity in fish species [Zheng et al. 2010]. The most observable repeat motifs identified in H. lobatus were the same as those found in H. siamensis [Iranawati et al. 2012] and Esheostoma okaloosae [Saarinen and Austin 2010] and included AGAT, ATCT, ATT, AAT, AC, GT repeats types, while for silver crucian carp (Carassius auratus) and Japanese flounder (Paralichthtys olivaceus), the most observable repeat motifs found were CAG and GATA [Zheng et al. 2010, Casta¿½o-Sanchez et al. 2007 ], respectively.
With respect to their relative mutability [Chambers and MacAvoy 2000] and easiness of scoring [Ellegren 2000], 25 tetranucleotide repeat loci were screened in H. lobatus. Subsequently, eight loci were selected to test for extent of polymorphism using two wild sample populations from Nonkhai (NK), Thailand and the Bassac River (BR) in Vietnam, and also for cross-species amplification with two wild sample populations of H. siamensis (Ubon Rathachani (UB), Thailand and Battambang (BB), Cambodia). Three of the 8 loci developed for H. lobatus also amplified in H. siamensis but could not be genotyped due to presence of stutter bands and a single locus failed to amplify in H. siamensis. In contrast, 7 of 8 loci developed for H. siamensis [Iranawati et al, 2012] amplified successfully and could be used for genotyping in H. lobatus. Thus, eleven loci in total or approximately 70% of those developed and screened in either species were available for further statistical analysis of genetic diversity in both species (Table 4).
Expected heterozygosity estimates (HE; mean ¿½ standard deviation) at the 11 loci in the sampled populations in this study (Table 4) ranged from 0.301 to 0.967 (mean 0.803 ¿½ 0.154) while observed (HO) ranged from 0.280 to 0.960 (mean 0.678 ¿½ 0.193). Some loci tested showed significant Hardy-Weinberg disequilibrium. Missing data and differences between HO and HE estimates may result from some PCR amplification failures at particular loci, and/or presence of null alleles. Result of GLM indicate that the number of alleles (Na) were significant different among loci tested (p <0.05) while no significant difference (Na) was observed between species or between populations. Na per locus ranged from 2 to 23 (mean 11.64 ¿½ 4.7), with a higher Na per locus observed for H. lobatus (L16, L21 and L22) in H. lobatus sample (BR/NK) compared with H. siamensis (BB/UB), while for loci specific for H. siamensis, three out of seven loci (S2, S12, S14) showed higher Na estimates per locus in H. siamensis (BB/UB) compared with H. lobatus (BR/NK). Post hoc test revealed that differences in Na in both species were evident only between locus L21 with S12 and locus S12 with S24. Micro-Checker results showed no evidence for null alleles in locus L1, L21, S14, S23 and S24 in both species. While in general Na in these two species did not differ significantly, apparent differences of Na at some loci and evidence for null alleles as in L16 needs to be considered further. In addition, polymorphism information content (PIC) per locus ranged from 0.252 to 0.940 (mean 0.764 ¿½ 0.164), suggesting that loci characterised here vary between relatively low to high levels of allelic variation [Cheng et al. 2007].
Sequence similarity of identified sequences to genes of known function as confirmed in BLAST searches theoretically allows novel genes to be discovered and putative gene functions to be assigned in new species. While identification of functional proteins using the BLASTx search approach and the non-redundant (nr) database revealed that two out of four SSR sequences in H. lobatus occurred in putative genes (L21, L22), their similarity values were very low (>9.00*10?08). In non-model species it has been proposed that massive fragmented gDNA sequences generated using the 454 Roche platform can offer great numbers of anonymous nuclear loci (Bertozzi et al. 2012). Hence, it is presumable that unknown SSR contig sequences reported here mostly are located in non-coding gene regions or may even be novel genes in the target species that cannot be assigned a putative gene as consequence to the limitation of genetic information availability for the object species at present. In addition, considering intricacy of gene structure and the possibility of pseudogenes however, this assumption must be taken with caution.
Here we developed microsatellite markers for H. lobatus, a regionally important carp species in the MRB applying low coverage GSS sequences generated from 454 Roche pyrosequencing. Significant numbers of potentially valuable SSR markers in H. lobatus required a validation, and some may potentially applicable for ecology and evolutionary study in other closely related carp taxa. To date, four out of eight polymorphic SSR specific loci developed for H. lobatus and seven polymorphic SSR specific for H. siamensis successfully apply for cross specific amplification in both species, with transferability rate reaching approximately 70%. In addition, similar putative genes in H. siamensis were also identified in H. lobatus, potentially a result of their close phylogenetic relationship but this will need further study as consequence of the complicated structure of nuclear gene. 454 Pyrosequencing permitted us to generate a significant number of SSRs fastly and the markers developed can be applied to wild stock management of H. lobatus populations at different geographic scale across the MRB.