Identification of putative segmentation genes and associated regulation


This article addresses a number of key molecular methods and procedures on how to study non-model organisms on the genomic level, employing current technological advances in histology and molecular biology as well as exploiting in-depth understanding of model organisms (such as Drosophila melanogaster). Supposed a pool of worms was found, where various lengths of worms displaying external skin segments (like an earthworm) with repeated bristle patterns and some are bigger at the front than at the back. Having access to all histology and molecular biology techniques, this article will cover the possible techniques to conduct this hypothetical investigation. However these worms are non-model organisms, thus there is no prior knowledge on their generation time, reproduction mode/speed and even their life cycle. This article will tackle the following matters; (a) identifying segmentation genes of the worms, (b) investigating the expression of genes and (c) assessing the function of the mentioned genes. The first step is by generating express sequence tag (EST) library based on complementary-DNAs reversely-transcribed from the worms' mRNA. The EST tags are then compared against online database to find any consensus sequence to any segmentation genes present in other model organism, such as Drosophila melanogaster. Once the putative gene is identified and isolated, it will be undergo functional analysis to assess its gene function by gain-of-function analysis and RNA interference-mediated analysis.

Lady using a tablet
Lady using a tablet


Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

Constructing Express Sequence Tag (EST) Library

Different lengths and sizes of worms found indicate that each individual organism primarily exist at different developmental stages in their life cycle. The first possible way to deal with this is to group the worms with similar phenotypes on the basis of 'guilt-by-association'. Worms of similar length should be, at the very least, at the same stage of their life cycle. To enable further experimental analysis being carried out, these groups of worms will then be stored in a cosmid/plasmid library or as glycerol stocks. In other words, the genomes of the worms are being cloned to generate more copies of their genetic material.

To identify a specific gene, a plausible method to execute this is by extracting the putative gene out from the organism in question and finds out any homologue occurrences in other model organisms. However, this method is not as easy as it sounds as no prior knowledge of the non-model organisms need to be taken into account.

By constructing expressed sequence tag (EST) library, this method can provide more insights and evidences on genetic developmental regulation of the worms. Figure 1 illustrates sequential stages in generating EST libraries. Messenger RNA (mRNA) of the worms must be isolated first and purified. As mRNA is rather unstable, complementary DNA (cDNA) is then made from the mRNA by reversible transcription using an enzyme called reverse transcriptase (Luo et al, 2010). The resultant double-stranded cDNA is then ligated into an appropriate vector using available restriction sites, followed by an insertion into a bacterial strain, for example Escherichia coli and allow breeding. This gives rise to cDNA clones which make up cDNA libraries corresponding to the mRNAs isolated and purified from the pool of worms. The cDNA clones are then randomly sequenced from both 5' and 3' ends to generate EST tags. ESTs can be characterized as short (100-800 nucleotides in length), unedited, single-pass run generated from cDNA libraries (Nagaraj et al, 2007; Luo et al, 2010).

Figure 1. (Nagaraj et al, 2007)

This method of constructing EST library is cost-effective and not as labour-intensive as sequencing the whole genome of an organism. However, the EST tags produced may be redundant as they represent the expressed genes in that particular tissues or cells. Figure 2 illustrates limitations and errors may present in EST tags. To minimize errors such as repeat sequences or vector-contaminated bases, the EST tags are pre-processed by programs such as RepeatMasker or MaskerAid (Nagaraj et al, 2007; Luo et al, 2010).

Fig. 2. (Nagaraj et al, 2007)

Errors such as low-complexity regions, vector-contaminated bases and over-representation of host transcripts are edited out. Poly-adenine (poly-A) and poly-thymine (poly-T) tail at the 3' ends gets filtered out too for quality assessment purposes (step 1, Fig.1). The processed, high-quality EST tags are then assembled and clustered using programs such as Phrap and CAP3 (Nagaraj et al, 2007) to generate consensus EST sequences (step 2, Fig. 1). Individual clusters of similar consensus sequences are inferred to be from the same putative gene. By comparing the clusters against available EST databases such as NCBI's database Expressed Sequence Tag (dbEST) and Basic Local Alignment Tool (BLAST), EST tags can be assigned to a specific gene which has been previously sequenced (step 3, Fig. 1). Alternatively, EST consensus sequences can be conceptually translated into peptide chains (step 4, Fig. 2) and compare them again protein database such as BLASTX.

Lady using a tablet
Lady using a tablet


Writing Services

Lady Using Tablet

Always on Time

Marked to Standard

Order Now

Comparing the consensus EST contigs (putative genes) from the worm genome against BLAST database can result in thousands of similarity searches across numerous other species. To be more accurate, the consensus EST sequences can be compared against the fruit fly Drosophila melanogaster genome using its specialized and comprehensive database, FlyBase (Grumbling et al, 2006). Drosophila has been the model organism for over more than 90 years and its developmental regulation and processes have been extensively studied. The segmentation genes in Drosophila can be easily retrieved from FlyBase database and compare them against the EST contigs produced earlier (Banfi et al, 1997). Putative genes involved in segmentation in the worms can then be identified based on sequence similarities. This method seems feasible when a significant percentage of similarities were observed between Drosophila gene product and human ESTs, consequently named as Drosophila-related expressed sequences (DRES) (Banfi et al, 1997).

Which genes are involved?

Segmentation observed on the worms can be broadly defined as serial yet repetitive patterning on the body axis. This indicates there must be a spatial regulation involved in turning on and off the segmentation genes. To date, about 40 genes have been identified to be responsible for segmentation in Drosophila (Peel et al, 2005). These genes turned out to operate in a hierarchal order as shown in Figure 3.


Figure 3. A schematic diagram of segmentation gene networks in Drosophila melanogaster. A) Different classes of genes regulation in a hierarchical order. Each class of genes are present in distinct spatial manner within the embryo. B) Examples of genes in each class of genes (Schroeder et al, 2004).

A signalling cascade of transcription factors have been shown to direct segmentation along anterior-posterior axis in D. melanogaster ((Peel et al, 2005; Schroeder et al, 2004). As illustrated in Figure 3, maternal transcription factors initiate the signalling cascade by interacting with gap transcription factors which then act upon the pair-rule genes and homeotic genes. Segment polarity genes are also regulated by the pair-rule genes (Schroeder et al, 2004). It is overwhelming to decide which gene is the key factor in controlling segmentation along the anterior-posterior axis, especially in the non-model organism worm. Functional analysis of gap genes have been done on several other segmented insects of which orthologues of Drosopohila melanogaster gap genes have been identified (Peel et al, 2005; Liu & Kaufman, 2004). Although gap genes in Drosophila such as Krüppel and hunchback may not elucidate the entire segmentation network, their roles in signalling cascade and ultimately their regulation of homeotic genes (Peel et al, 2005) may provide some interesting findings.

Antennapedia (Antp), Ultrabithorax (Ubx) and fushi tarazu (ftz) are homeotic selector genes which are essential in development of segments during embryonic stages of D. melanogaster (McGinnis et al, 1984). Mutations in these genes result in considerable morphological defects along the A-P axis (Pearson et al, 2005) and even embryonic lethality (McGinnis et al, 1984). Thus, homeotic genes sequences are the best candidate for the EST consensus sequence to be compared against FlyBase database. For the sake of clarity, ftz gene will be the principal target of this investigation.

Following identification of sequence similarity search of ftz gene on FlyBase, its cDNA sequence can be traced back from the cDNA library completed earlier. Using hybridization technique, the genomic sequence of ftz gene can be determined. Both worm's genomic DNA and cDNA clones containing ftz sequence will be denatured. Each of the single-stranded cDNA is then tagged with different fluorescent markers such as red and green fluorescent protein. Subsequently, the fluorescent-tagged cDNAs will hybridize to a microarray containing the worm's denatured genomic DNA. The cDNAs will only bind to the complementary sequence of the DNA containing the ftz gene. After washing off to remove excess unbound DNA, cDNA-DNA hybrids should fluoresce when observed under fluorescence microscope in order to determine where the complementary binding occurs. The ftz gene is then isolated following identification of its location within the worm's genomic DNA

Hunting for regulatory elements

Developmental genes are highly conserved in its coding sequences across taxa, such as the pax6 gene which regulates eye development in D. melanogaster and mammals (Dickmeis & Mueller, 2005). Interestingly, non-coding sequences were also found to be highly conserved in distantly related species, as described in (Aparicio et al, 1995). These non-coding sequences usually harbour cis-regulatory elements where regulatory proteins (transcription factors) bind to, forming protein complexes such as enhanceosomes (Dickmeis & Mueller, 2005), GuhaThakurta & Stormo, 2007). Cis-regulatory modules may present within introns and intergenic regions. Their central function is to receive multiple inputs (for example from bidning of transcription factors) and in turn, modulate gene expression (Zeitlinger & Stark, 2010; Cooper & Sidow, 2003). Hence, it is important to identify sequence similarities in both coding and non-coding sequence between the segmented worm's genome (non-model organism) and D. melanogaster (model organism).

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

Generally, regulatory modules or transcription factor binding sites (TFBSs) are short, about 5-20 nucleotides long (Blanchette & Tompa, 2002; Zhang & Gerstein, 2003). Multiple algorithms and programs have been developed to discover these regulatory motifs. Due to relatively short TFBSs sequence, biologists usually employ a position-specific weight matrices (PWMs) approach (Zhang & Gerstein, 2003). PWM is a matrix of score values that allot on each position within a DNA sequence, the relative frequencies of all four possible nucleotides (Stormo, 2000; Zhang & Gerstein, 2003), GuhaThakurta & Stormo, 2007).

Figure 4. Aligned nucleotides from four species of yeast Saccharomyces are represented both in gene (left) and neighbouring intergenic region (right). Vertically-stacked squares indicate the four different species and are coloured based on their conservation properties; green for conserved sequence, yellow for mutation, grey for gapped portion and red for frameshift mutation (Kellis et al, 2004).

Individual genes from four different species of yeast Saccharomyces were aligned together by anchoring their flanking intergenic region using CLUSTALW (Kellis et al, 2004). Figure 4 represents the result of the alignments where both genic and intergenic sequences are compared. In the context of this investigation, ftz gene isolated from the segmented worm can be aligned with ftz gene from D. melanogaster to observe the conserved sequences within the intergenic regions. Relevant regulatory motifs may present in the neighbouring intergenic regions. This method however, may not produce accurate result if two or more species in comparison are greatly diverged. Global multiple alignments may overlook such short sequence of TFBS, in relative to the entire conserved region (about 1000 base-pair long) (Blanchette & Tompa, 2002). In contrast, if they are too closely related, sequence alignment will be too obvious and uninformative (Blanchette & Tompa, 2002).




Web resources

Analyse TF-binding site through comparative genomics




Scan DNA sequences with a given DNA motif model




Discovering novel DNA Position Weight Matrix (PWM)

Gibbs sampler


ALIGNACE ""gnace/alignace.html

Table 1: Examples of web resources available for various DNA motifs or regulatory elements discovery (GuhaThakurta & Stormo, 2007).

An alternative method is Hidden Markov model (HMM) which is used to deduce promoter regions based on likelihood function of potential transcription factors (TF) binding sites (Buske et al, 2010). GOMO algorithm predicts potential targets of TF and resulted in increased accuracy of regulatory motifs discovery in yeast S. cerevisiae and human (Buske et al, 2010). Examples of other available algorithms or programs to identify regulatory motifs are listed in Table 1.

The most widely-used method to perform cross-species sequence alignment is by phylogenetic footprinting (Buske et al, 2010, GuhaThakurta & Stormo, 2007). By employing this method, a regulatory module responsible for modulating IL-5 gene was identified to be conserved in both humans and mice and is located over 120 kilobases away from the promoter site (Loots et al, 2000). This shows that phylogenetic footprinting method is able to identify non-coding DNA, long-range regulatory modules and suitable for cross-species comparison. Thus if the non-coding regions of ftz gene orthologues from both D. melanogaster and the segmented worm are aligned, conserved regulatory elements can be detected. By utilizing this technique and further assistance by programs such as ConSite, 85% increase in accuracy of TFBSs is observed than using model matrices alone (Zhang & Gerstein, 2003). A representation of phylogenetic footprinting principle is illustrated in Figure 5 below.

Figure 5. Phylogenetic footprinting to discover transcription factor binding sites (TFBSs). A hypothetical human gene along with its regulatory region is aligned with orthologs from rat, mouse and chimpanzee. TFBS1 and TFBS4 are conserved in all four mammals; TFBS2' and TFBS2 have diverged between the primate (chimpanzee and human) and rodent (rat and mouse) lineages. TFBS3 is newly acquired in the primate lineage (Zhang & Gerstein, 2003).

Functional analysis

Acquiring genetic information of the candidate ftz gene alone is not enough to elucidate the whole picture in terms of body patterning in the segmented worms. A good science practice is to infer or hypothesise on the role(s) of the candidate gene, assess by functional analysis and finally, an accurate function or role can be annotated to the gene. Various functional tests; functional genetic screen, co-clustering on microarray, gene targeting just to name a few, can be carried out to verify the role of the predicted cis-regulatory elements. Quoting (Noor & Feder, 2006), "the final standard of proof for causality of candidate genes can be confirmed by direct genetic manipulation".

A conventional approach to address this situation is by performing loss-of-function analysis (Dickmeis & Mueller, 2005) where function of a specific gene is blocked (knock-down). By means of mutagenizing agents such as ionizing radiation and chemical mutagens, or insertion of transposable elements into genomes (Parinov et al, 2004), they produce reduced or loss-of-function mutants (Rorth et al, 1998). Alternative improvements in this analysis is by RNA interference (RNAi) where injected double-stranded RNA silences gene expression by degradation of endogenous cognate mRNA (Kirby et al, 2002). Introduction of morpholino anti-sense oligonucleotides into genome have also proven to be successful in blocking RNA translation resulting in mutant phenocopies (Figure 6) (Nasevicius & Ekker, 2000; Parinov et al, 2004). However, many genes do not exhibit loss-of-function phenotypes under normal condition (Rorth et al, 1998). This may be due to presence of closely related sequences having similar gene functions and so, rescue directed mutagenesis.

On the other hand, gain-of-function analysis results in mutants which are more informative in terms of gene function (Rorth et al, 1998). This method is applicable to measure activity of gene reporter constructs in presence of desired regulatory elements whether in vivo (by producing transgenics) or in vitro (cell culture system) (Dickmeis & Mueller, 2005). The principle of this technique is to induce expression of gene via reporter constructs which include promoters, detectable reporter gene and the desired cis-regulatory modules (Dickmeis & Mueller, 2005; Brand & Perrimon, 1993)

Figure 6: Introduction of morpholinos (MO) caused ubiquitous gene inhibition in zebrafish. e-n images are at the 28-hour stage of zebrafish embryo. Uninjected , sphere-stage(a) and 28-hour embryo (b) viewed under FITC illumination, before the injection of MO. Sphere (c) and 28-hour embryo (d) observed after MO injection. Wild-type embryos: (m) and (n). GFP expression in (e) and (f) are ubiquitous in uninjected embryos. 4.5ng of control MO were injected in (g) and (h). In (i) and (j), 4.5 ng of GFP antisense morpholino were injected. GFP expression is almost similar as seen in (g) and (h). GFP expression is inhibited in all cells when injected with 4.5ng of GFP morpholino, as shown in (k) and (l) (Nasevicius & Ekker, 2000)


A transcriptional activator, GAL4, is isolated from yeast Saccharomyces cerevisiae which enables transcription in fruit flies via promoters containing GAL4 binding sites (Brand & Perrimon, 1993). It is important to use transcriptional activators which do not have endogenous targets in the organism to ensure that the target gene is not ectopically expressed (Brand & Perrimon, 1993). Two constructs are prepared; one construct contains GAL4 sequence under genomic enhancer such as P-element, another one contains GAL4-dependent-target-gene (Brand & Perrimon, 1993). When one line of flies bearing the first construct is mated with another line bearing the second construct, a transgenic line of progeny is made (Figure 7). From its genomic site of integration, P-elements can direct transcription of any neighbouring gene (Rorth et al, 1998) and can also be mobilized to new genomic sites (Brand & Perrimon, 1993).

Figure 7. Gene misexpression in Drosophila melanogaster. Left: Enhancer trap GAL4 line containing GAL4 sequence under genomic enhancer, such as P-element, is inserted randomly into genom(Brand & Perrimon, 1993)e. Right: UAS-Gene X line containing GAL4-dependent-target-gene which is the Upstream Activation Sequence (UAS). The two lines are mated together to produce progeny that express Gene X in cell-specific pattern (Brand & Perrimon, 1993).




Figure 8. Examples of expression patterns in different GAL4 lines under different drivers/enhancers. GAL4 expression of ombGAL4 (A) and dppGAL4 (B) are observed in third instar wing and sevGAL4 (C) in eye imaginal discs. The expressions are visualized by X-Gal staining of β-galactosidase, due to the presence of UAS-lacZ transgene in the transgenic larvae construct (Rorth et al, 1998).

Deciding on which expression vectors to be used is also important factor in this technique. Phage or bacterial vectors are particularly useful in functional analysis of long-range regulatory elements (Dickmeis & Mueller, 2005). Figure 8 shows different expression patterns can be produced by using different GAL4 lines.

There are several ways to conduct enhancer detection screen; progeny carrying UAS-lacZ construct can be assayed for β-galactosidase expression by staining with anti-β-galactosidase-antibodies (Brand & Perrimon, 1993), antibodies can be fluorescently-labelled and viewed under laser confocal or fluorescence microscope for more accurate observation (Richter et al, 1998), UAS-GFP construct can be easily detected using fluorescence microscope (Pfeiffer et al, 2010) as well as in-situ hybridization (Brand & Perrimon, 1993). Gain-of-function analysis is very useful in observing gene function under temporal and spatial manner (Figure 9).

By isolating ftz gene from the segmented worm and mis-express it in the model organism D. melanogaster, its spatial and temporal expression can be monitored. From these techniques, our inferences can be proven valid or otherwise. If the gene expression is limited to segments or stripes, it can be deduced that ftz gene is involved in regulation of segmentation in the worm. Gene targeting may also reveal novel genes that are also involved in modulating segmentation in organism. By mis-expressing even-skipped gene in the fruit fly, selective repression of wingless gene was observed (Figure 10) (Brand & Perrimon, 1993). One lesson to be learnt from this is that other than determining gene functions, discovering novel genes and their interactions within a regulatory network is also as important.

Evidently, functional analysis have been developed sophisticatedly nowadays and yet, much more can be improved for various reasons; to generate accurate representation of experimental end-products, to reduce cost and time to conduct experimental procedures, to produce results efficiently, just to name a few. This topic has been discussed extensively in (Pfeiffer et al, 2010)). Integration of different analysis can also improve in producing more in-depth results, for example GAL4-UAS expression system is used in conjunction with RNAi-mediated gene targeting (Kirby et al, 2002).

Figure 9: Misexpression of GAL4-dependent even-skipped. Ectopic expressions of even-skipped gene were observed under temporal and spatial manner, following staining transgenic fly's progeny with anti-even-skipped-antibodies. Anterior is to the left of the images and ventral is at the bottom. A) Ectopic expression observed in seven stripes along the axis and in the head of a stage 9 embryo. B) Ectopic expression in the muscles of stage 13 embryo. C) Ectopic expression throughout central nervous system of stage 12 embryo (Brand & Perrimon, 1993).


Several molecular biology techniques have been presented in this article on identifying segmentation genes in a non-model organism. Although the techniques are not discussed lengthily, the main aim of this investigation is to consider different possible methods that can be applied. Having said that, there are various other techniques and methods available which have not been discussed in this article simply because a great deal of them require extensive explanation and are out of scope. By applying segmentation network knowledge of model organism Drosophila melanogaster, it is hoped that the techniques mentioned produce valid results and shed some light in understanding development in the non-model organism worm. The framework of this investigation is generally applicable to other non-model organisms and hence, can benefit in other developmental studies.

(2792 words)