Coorthology Pax4 And Pax6 To Fly Eyeless Gene Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Members of the Pax (paired-box) gene family encode transcription factors that play crucial roles in development (Wehr and Gruss 1996). A milestone in the 1990s which promoted subsequent intensive studies on Pax genes was the ability of the Drosophila melanogaster eyeless gene as well as its mouse ortholog Pax6 to induce eye formation when expressed ectopically in flies (Halder et al. 1995). Pax6/eyeless genes have thus been recognized as the master control gene for eye development (Gehring and Ikeo 1999). A recent report on secondary changes in the dipteran lineage shed light on a divergent aspect of the Pax6/eyeless orthology (Lynch and Wagner 2010). It is intriguing to reveal possible changes in the chordate lineage.

Traditionally, non-phylogenetic classifications have grouped Pax4 with Pax6 because of the absence of a conserved octapeptide in both of them (Wehr and Gruss 1996). The other vertebrate Pax genes are divided into the classes Pax1/9, Pax3/7 and Pax2/5/8, depending on the completeness of the homeodomain (Chi and Epstein 2002). Recent studies suggested that the first wave of the diversification of the Pax gene family dates back to the early metazoan era (Matus et al. 2007). The second wave of the diversification of Pax genes later in the vertebrate lineage is marked by gene duplications between Pax2, -5 and -8 (Kozmik et al. 1999; Bassham et al. 2008; Goode and Elgar 2009), between Pax1 and -9 (Holland et al. 1995; Ogasawara et al. 1999; Mise et al. 2008) and between Pax3 and -7 (Holland et al. 1999). These gene duplications occurred after invertebrate chordates branched off, but most likely before the split between gnathostomes and cyclostomes (McCauley and Bronner-Fraser 2002; O'Neill et al. 2007). This timing matches that of so-called two-round whole genome duplications (2R-WGDs) implicated in early vertebrate evolution (Kuraku et al. 2009; reviewed in Panopoulou and Poustka 2005). However, it has not been explored, in the modern framework of molecular phylogenetics and comparative genomics, whether the Pax4-Pax6 split also coincided with this second wave of diversification (Fig. 1A).

The timing of the gene duplication has significant impacts on our understanding of evolutionary modification of gene repertoires and functions. In fact, Pax4 genes have been reported only for human (Pilz et al. 1993), mouse (Sosa-Pineda et al. 1997) and rat (Tokuyama et al. 1998), suggesting that Pax4 originated from a gene duplication unique to the mammalian lineage (Fig. 1B). However, family-wide phylogenetic analyses performed to date usually suggested an ancient origin of the Pax4 gene early in metazoan evolution (Fig. 1C; Hoshiyama et al. 1998; Wada et al. 1998; Breitling and Gerber 2000). In these studies, invertebrate genes identified as Pax6 orthologs, such as fly eyeless (Bopp et al. 1986) and Caenorhabditis elegans vab-3 (Chisholm and Horvitz 1995; Zhang and Emmons 1995), were shown to be more closely related to vertebrate Pax6 genes, than to Pax4 genes (Fig. 1C). Because critical phylogenetic signals may be obscured by divergent sequences from other Pax classes, the long-standing question regarding the timing of the Pax4-Pax6 split should be addressed using a focused dataset aiming to resolve the Pax4-Pax6 relationship.

Gene duplications are usually followed by interplay between duplicates in terms of their functional differentiation. Thus, a comparison of the regulations and functions of duplicates can also lead to better understanding of gene family evolution. In mammals, in addition to the aforementioned inductive role in eye development, Pax6 is involved in development of the central nervous system (CNS), including the fore- and hindbrain, the neural tube, the pituitary and the nasal epithelium (Walther and Gruss 1991). In mouse, Pax6 is also expressed in all the four cell types (α, β, δ and γ) in the islets of Langerhans, the endocrine part of the pancreas (St-Onge et al. 1997). In zebrafish, a composite expression pattern of pax6a and pax6b highly resembles that of its mouse ortholog (Kleinjan et al. 2008; also see Kinkel and Prince 2009 for a review on zebrafish pancreas development).

In contrast, Pax4, identified only in mammals, has not been implicated in eye development, but is rather expressed in the retinal photoreceptor cells (Rath, Bailey, Kim, Coon et al. 2009). Pax4 is also expressed mainly in the β-cells of the pancreas, and is necessary for the differentiation of both β- and δ-cell lineages (Sosa-Pineda et al. 1997). A recent study revealed plasticity for pancreatic α-cells to transdifferentiate into β-cells (Thorel et al. 2010). Importantly, Pax4 can trigger this transdifferentiation (Collombat et al. 2009; also see Liu and Habener 2009). This aspect of the Pax4 function attracts attentions as a potential clinical target of diabetes therapy (Gonez and Knight 2010). It would be intriguing to reveal possible alterations or conservation in regulation of Pax4 expressions during evolution in order to reveal the evolutionary history of partitioned or redundant roles between Pax4 and Pax6 genes. However, a thorough comparative picture has been obscured by the lack of our knowledge about non-mammalian Pax4 orthologs.

In this study, we characterized the previously unidentified non-mammalian Pax4 orthologs in teleost fish genomes and performed combinatorial analyses on molecular phylogeny, conserved synteny and gene expression patterns. Our analysis favoured a scenario which postulates the duplication between Pax4 and Pax6 genes in the 2R-WGDs (Fig. 1A). In light of this evolutionary scheme, we conclude that Pax4 secondarily lost its expression in the central nervous system (CNS) after the 2R-WGD early in vertebrate evolution. This could have led to the highly asymmetric evolution between Pax4 and Pax6.



Total RNA was extracted from a whole 52 hpf zebrafish embryo. The RNA was reverse transcribed into cDNA with SuperScript III (Invitrogen) using a 3' RACE System (Invitrogen). This cDNA was used as template in the following 3' RACE PCR. The first reaction was performed using the forward primer 5'-GACTGAGGGAATGAGACCAT-3', and the product of this PCR was used as template for the nested PCR with the forward primer 5'-CGCAGAGGAGACAAACCTTT-3'. These primers were designed based on zebrafish transcript sequences in Ensembl (ENSDART00000027919 and ENSDART00000078690). The middle fragment was amplified using the forward primer 5'-ATGATTGAGCTGGCGACTGA-3' and the reverse primer 5'-TCAAACTTTCGCTCCCTCCT-3' in the first PCR and the forward primer 5'-GACTGAGGGAATGAGACCAT-3' and the reverse primer 5'-CCTCATCCTCGCTCTTGATA-3' in a nested PCR. The upstream fragment (covering the start codon) was amplified using the forward primer 5'-TTTCTAGGATGTTCAGCC-3' and the reverse primer 5'-CTCTTGTGCTGAACTATG-3' in the first PCR and the forward primer 5'-CAGCCAATTCTGCATGTA-3' and the reverse primer 5'-TGATGGAGATGACTTCAG-3' in a nested PCR. We concatenated the sequences of these three fragments into one with the full-length open reading frame (ORF) and deposited it in EMBL under the accession number FR727738.

For in situ hybridization to detect zebrafish pax6b transcripts, a fragment covering its 3'-end was isolated with 3' RACE using the forward primer 5'-GTTTCACTGTTTTGCTCG-3' in the first PCR, and the forward primer 5'-ACAGGACAACGGTGGTGAAAA-3' in the nested PCR.

In situ hybridization

Two zebrafish pax4 riboprobes were prepared separately using the middle and 3' cDNA fragments described above. Whole-mount in situ hybridization using the pax4 riboprobes labeled with digoxigenin (DIG)-UTP and the pax6b riboprobes labeled with Fluorescein (Roche Applied Science) was performed as previously described (Begemann et al. 2001). Hybridization was detected with alkaline phosphatase (AP)-conjugated anti-DIG antibody (Roche Applied Science) followed by incubation with NBT/BCIP for pax4, and with AP-conjugated anti-Fluorescein antibody (Roche Applied Science) followed by INT/BCIP-based detection for pax6b. In double in situ staining, pax6b transcripts were detected first, and after a washing step in 0.1 M glycine (pH 2.2), pax4 transcripts were detected.

Fluorescent in situ hybridization was performed using the tyramide signal amplification (TSA) system (Invitrogen) as instructed by the manufacturer. DIG-labeled riboprobe was detected with horseradish peroxidase (HRP)-conjugated anti-DIG antibody. After incubating with biotinyl-tyramide, fluorescent signal was detected with streptavidin-488 (Invitrogen).

Retrieval of sequences

Sequences for members of the Pax gene family were retrieved from the Ensembl genome database (version 58; Hubbard et al. 2009) and NCBI Protein database, by performing Blastp searches (Altschul et al. 1997) using mammalian Pax4 and Pax6 peptide sequences as queries. The zebrafish pax4 sequence was curated by aligning the cDNA sequence we isolated in this study with the zebrafish genome assembly Zv8 (Fig. S1).

Molecular phylogenetic analysis

An optimal multiple alignment of 54 collected amino acid sequences (see Table S1) was constructed with the program MAFFT (Katoh et al. 2005). In tree inferences, we used amino acid residues unambiguously aligned with no gaps, which cover both paired domain and homeodomain. Optimal amino acid substitution models were selected by ProtTest (Abascal et al. 2005). The phylogenetic tree inference with the first dataset employed the LG + I + Γ4 model, while the inference with the second dataset (see below) employed the JTT + Γ4 model. Heuristic tree searches with the ML method were performed in PhyML (Guindon and Gascuel 2003) with 100 bootstrap resamplings.

Exhaustive tree searches with the ML method were performed using Tree-Puzzle (Schmidt et al. 2002), where we input all 10,395 possible tree topologies consisting of eight operational taxonomic units (OTUs), namely, (1) mammalian Pax4, (2) teleost Pax4, (3) gnathostome (jawed vertebrate) Pax6, (4) lamprey Pax6, (5) amphioxus Pax6, (6) tunicate Pax6, (7) protostome Pax6/eyeless orthologs (including eyeless and twin of eyeless) and (8) outgroup (putative Nematostella vectensis Pax6 ortholog, Ciona Pax3/7, fly paired, human Pax3 and human Pax7) (for species names and accession IDs, see Table S1). Relationships within these individual OTUs were constrained according to generally accepted species phylogeny (Meyer and Zardoya 2003; Cracraft and Donoghue 2004; Tsagkogeorga et al. 2009; Philippe et al. 2005; Wiegmann et al. 2009). To provide support values, we performed bootstrapping with 100 resamplings by running Tree-Puzzle. Statistical tests to evaluate alternative tree topologies were performed using CONSEL (Shimodaira and Hasegawa 2001). Bayesian inferences were performed in MrBayes (Huelsenbeck and Ronquist 2001), where we ran 10,000,000 generations, sampled every 100 generations and excluded 25% of the sample as burnin.

Identification of conserved synteny

Via the BioMart interface, we downloaded a list of Ensembl IDs of 47 genes harbored in the genomic region spanning 20 Mb both upstream and downstream of Pax6 gene in human, together with IDs of paralogs of those genes. Our selection of genes in the Pax6-containing region that also had a paralog on chromosome 7 in a distance of 20 Mb up- and downstream of Pax4 resulted in eight cases. For each of these eight cases, we collected homologous sequences in the Ensembl and NCBI Protein databases, and inferred a molecular phylogenetic tree as described above (Fig. S5).

Survey of potential cis-regulatory elements

To identify conserved non-coding elements (CNEs) shared between Pax4 and Pax6, we used two approaches. First, we aligned the genomic regions containing the two genes using mVISTA (Frazer et al. 2004; under the default conservation parameters (70% identity for 100 bp of alignment length). In the alignment, we included a number of vertebrate species including human, mouse, cow, opossum, platypus, chicken, Xenopus laevis and zebrafish. Second, we implemented an analysis to detect local similarity in non-coding regions which is obscured by translocation and inversion of cis-regulatory elements. We extracted the intronic as well as the intergenic sequences until the next genes or within a length of 200 kb surrounding the two genes on the human chromosomes. To detect local similarities between the two non-exonic regions, one of the sequences was used as a query in a Blastn search against the other.

To detect CNEs shared between Pax4-containing genomic regions of different species, we retrieved genomic sequences covering Pax4 locus with 10 kb flanking sequences on both ends. When the next gene was located closer than 10 kb, only the intergenic region until the next gene was retrieved. Those sequences were compared in mVISTA. We also referred to VISTA Enhancer Browser containing experimentally validated non-coding fragments with transcriptional enhancer activity (Visel et al. 2007;, only to find that there is no Pax4-associated enhancer registered in this database.


Identification of teleost fish Pax4 genes

As a result of Blastp searches using mammalian Pax4 sequences, we identified Ensembl peptide sequences in the five teleost fish species with sequenced genomes that show higher similarity to Pax4 than to Pax6. Of these, in Ensembl database, only the zebrafish ones (ENSDARP00000013792 based on the Ensembl gene ENSDARG00000021336 and ENSDARP00000073151 based on the gene

ENSDARG00000056224) were not annotated as pax4. As in zebrafish, two peptides similar to pax4, derived from two genes annotated separately were found in Tetraodon nigroviridis (ENSTNIG00000000660 and ENSTNIG00000011020).

We isolated cDNA fragments of zebrafish pax4 by means of RT-PCR, and compared a resultant concatenated cDNA sequence with those in Ensembl. Our sequence matched both of the two zebrafish Ensembl entries, suggesting that these two were split because of a misidentification of the ORF of a single pax4 gene. We then aligned these sequences with the corresponding region in the genome assembly Zv8, and identified a putative full-length protein-coding sequence (Fig. S1). In this comparison, a presence of an exceptional splice donor site ('GC' instead of 'GT') was revealed (Fig. S1), and this was confirmed with our genomic PCR (data not shown). Using its deduced amino acid sequence based on the curated zebrafish pax4 ORF, we performed tBlastn searches in the genome assembly of other teleost fishes in Ensembl, and identified their putative pax4 peptide sequences (Fig. S2). Because the two aforementioned Tetraodon sequences do not share a region homologous to each other and are intervened by only a 66-bp stretch in the genome assembly, it is likely that they were also split because of a possibly wrong annotation of the ORF in the Ensembl database. Overall, in the five teleost fish species with sequenced genomes, we did not find any sequence which would represent the second pax4 paralog derived from the teleost-specific genome duplication (TSGD; Kuraku and Meyer 2009).

Sequence alignment containing the five teleost pax4 genes, other members of the Pax4/6 class, and human paralogs revealed a high level of conservation in the paired domain and in the homeodomain (Fig. S2). Many of the amino acid residues conserved between Pax6 sequences and their invertebrate orthologs were revealed to be altered in Pax4 sequences (Fig. S2).

Expression analysis of zebrafish pax4

Expression patterns of zebrafish pax4 were investigated by in situ hybridization for embryos spanning from 6 hours post fertilization (hpf) to 5 days post fertilization (dpf). Identical expression patterns were observed with both probes (see Materials and Methods).

The earliest signals were detected in the developing pancreas at 13 hpf (Fig. 2A), where expression persisted until 30 hpf. The strongest expression was seen around 24 hpf (Fig. 2B, C, E, and F). To examine the relative localization of the pancreatic expression signals of pax4 to that of pax6b, a marker of early pancreatic endocrine cell development (Biemar et al. 2001), we conducted a double staining of these two genes in 24 hpf zebrafish embryos. We observed partial overlap of pax4 and pax6b expressions (Fig. 2F). Expression of pax4 was nested in the pax6b-expressing domain in the endocrine part of the developing pancreas (Fig. 2D-F).

Expression of pax4 in the stomodeum was detected from 57 hpf to 96 hpf (Fig. 2G-I and not shown). Between 57 and 72 hpf, the expression domain was strongest in the ventrolateral corners of the oral cavity and surrounds the future mouth (Fig. 2G-I). More precisely, the signal in the region of the future lip was restricted to mesectodermal layers of the bilaminar stomodeum. The fluorescent in situ hybridization staining with the TSA-system additionally showed that the signal in the 72 hpf embryo is not restricted to the outer region of the stomodeum, but elongates into the oral cavity along the pharynx (Fig. 2G). At 96 hpf, pax4 expression was detected exclusively in the outer surface of the stomodeum, corresponding to the future lip (data not shown).

Survey of Pax4 orthologs in non-model species

To search for Pax4 orthologs outside the mammalian and teleost lineages, tBlastn searches were performed online using the human Pax4 peptide sequence as a query. First, we performed a search in NCBI dbEST and nr/nt databases of all vertebrates, specifying 'Craniata' (taxon ID: 89593 in NCBI Taxonomy) while excluding mammalian (taxon ID: 40674) and teleost sequences (taxon ID: 32443)―note that the taxon 'Craniata' adopted in NCBI Taxonomy is incompatible with molecular phylogenetic evidence supporting monophyly of cyclostomes (reviewed in Kuraku 2008). Second, we performed tBlastn searches against nucleotide genomic sequences of species included in Ensembl Genome Browser ( These searches resulted in no Pax4 sequences in all available vertebrate species outside Teleostei and Mammalia, such as Xenopus tropicalis, chicken, zebra finch, and anole lizard. Similarly, invertebrate species were revealed to have no other Pax4/6 sequences other than those already recognized as Pax6 orthologs.

Our additional search in Mammalia detected Pax4 orthologs in non-eutherians (platypus, ENSOANG00000000819; opossum, ENSMODG00000015218), and early-branching eutherians (two-toed sloth, ENSCHOG00000009265; African elephant, ENSLAFG00000005297, and rock hyrax ENSPCAG00000016257). Overall, our effort to find additional Pax4 orthologs, substantiated by available whole genome sequences, strongly suggested the restricted phylogenetic distribution of Pax4 orthologs to Mammalia and Teleostei. Our attempt with RT-PCR to identify Pax4 in cyclostomes, chondrichthyans and non-teleost actinopterygian fishes resulted in no additional orthologs, which should be confirmed with anticipated whole genome sequences of species in those missing lineages.

Molecular phylogeny of Pax4 and Pax6

Our molecular phylogenetic analysis employed two sequence datasets. The first dataset included diverse invertebrates as well as vertebrates (see Table S1). Heuristic ML tree search and Bayesian inference produced consistent results on several points (Fig. 3). The putative Nematostella vectensis (starlet sea anemone) Pax6 ortholog was placed outside the monophyletic group of bilaterian sequences. Inside the Pax6 group of bilaterians, however, the resultant tree topology with many low support values was largely inconsistent with generally accepted species phylogeny. For this reason, this phylogenetic analysis did not provide sufficient resolution to evaluate the alternative scenarios introduced in Figure 1, although the overall tree topology vaguely supported the scenario that the gene duplication giving rise to Pax4 occurred after the cnidaria-bilateria split, but before the deuterostome-protostome split (bootstrap probability in the ML analysis, 58). In contrast, the closest relationship between mammalian Pax4 and teleost fish pax4, as well as monophylies of these two individual groups, were relatively strongly supported (Fig. 3; bootstrap probability in the ML analysis, 94; Bayesian posterior probability, 1.00). twin of eyeless (toy) and eyeless (ey) genes of arthropods were closely related to each other, possibly because of a gene duplication in the arthropod lineage (Punzo et al. 2004; Lynch and Wagner 2010).

To perform a more focused assessment of the alternative scenarios, we prepared the second sequence dataset. In the previous dataset, there were four Branchiostoma floridae sequences (designated AmphiPax6) with polymorphic non-synonymous changes (Glardon et al. 1998) as well as a B. belcheri sequence (Fig. 3). The differences between these sequences were thought to have been introduced in the amphioxus lineage, because the monophyly of them was strongly supported (Fig. 3; bootstrap probability in the ML analysis, 94; Bayesian posterior probability, 1.00). Of those, we selected only one B. floridae sequence (CAA11366) with no such lineage-specific substitution. We excluded Dugesia japonica and Caenorhabditis elegans because of long branches leading to these sequences (Fig. 3). As jawed vertebrates, we retained human, opossum, Xenopus laevis and both pax6a and pax6b of zebrafish, Takifugu rubripes and stickleback. Loligo opalescent Pax6 was removed because its sequence was identical to Euprymna scolopes Pax6. We also excluded Saccoglossus kowalevskii Pax6 and echinoderm Pax6 (Paracentrotus lividus and Metacrinus rotundus) and medaka pax4. Using this second dataset including selected sequences, we performed a heuristic ML analyses. This analysis produced highly ambiguous results (data not shown) as in the analysis employing the first dataset (Fig. 3).

To statistically evaluate all possible tree topologies with this selected dataset, we performed an exhaustive ML analysis. To focus on the relationships of Pax4 genes with Pax6 and protostomes Pax6 orthologs, we classify the sequences into eight operational taxonomic units (OTUs) with their internal relationships constrained according to generally accepted species phylogeny (see Materials and Methods).

This analysis resulted in three tree topologies supported with the identical, highest likelihood value (Table S2). Our comparison of the difference of the likelihood of each tree topology from that of the ML tree topology revealed as many as 336 tree topologies not rejected with 1³ of the log-likelihood („logL/³€ < 1). The clustering between teleost Pax4 and mammalian Pax4 genes was relatively strongly supported (bootstrap probability in the ML analysis, 98; Bayesian posterior probability, 1.00). The tree topology violating this cluster had a significantly lower likelihood („logL = 18.81 ± 8.22). Among the three ML tree topologies, no substantial difference was observed in the levels of support based on the approximately unbiased (AU) test (Shimodaira 2000), the Shimodaira-Hasegawa (SH) test (Shimodaira and Hasegawa 1999) and resampling of estimated log-likelihoods (RELL) bootstrap probability (Kishino et al. 1990; Table S2).

Notably, apart from the position of pax4 genes, all of the three ML tree topologies as well as those supported with similar likelihood values (Table S2) showed large inconsistency with the generally accepted species phylogeny, when we assume orthology between Pax6/eyeless genes of diverse bilaterians. Thus, in order to assess alternative scenarios in a probabilistic framework based on the species phylogeny, we limited our targets of the CONSEL analysis to six tree topologies varying only the position of vertebrate Pax4 (Fig. S4). These six included those introduced in Figure 1 and the one vaguely supported in Figure 3. As a result, these tree topologies were revealed to be almost equally probable (Table 1). It was also notable that when we compare these six tree topologies with the ML tree in the heuristic analysis, all of the six were ranked below 1³ in likelihood values (data not shown).

Examination of the scale of the Pax4-Pax6 duplication

If the Pax4-Pax6 split took place in the vertebrate lineage (Fig. 1A), it is likely that it was part of the 2R-WGDs. In this scenario, similar arrays of genes should be found between genomic regions containing Pax4 and Pax6. Analyzing phylogeny of those genes may allow us to date the timing of the duplication event. We performed a comprehensive search of conserved synteny by comparing gene compositions in 40 Mb genomic stretches (20Mb on both ends) containing Pax4 and Pax6 in the human genome (see Materials and Methods). The search resulted in eight gene families whose members were shared between the two stretches (Fig. S5).

One of these eight gene families included the mitochondrial inner membrane protease subunit 1 (IMMP1L) gene on chromosome 11 and the IMMP2L gene on chromosome 7. This family experienced a gene duplication before the split between the animal and plant lineages (Fig. S5A). Except for this case, all the other seven shared genes were shown to have been duplicated in the vertebrate lineage, before the radiation of jawed vertebrates. In all cases where a cartilaginous fish sequence was available, it firmly clustered with a particular group of bony vertebrate orthologs (e.g., CREB3L1, LRRC4; Fig. S5B and C). Similarly, although not unambiguously supported, sea lamprey sequences also clustered with a particular group of jawed vertebrate orthologs (e.g., LRRC4, HIPK2, DGKZ ; Fig. S5C, E and F), suggesting that duplications of these genes occurred before the cyclostome-gnathostome split.

In spite of the wide scope (40 Mb) of our comparison, the seven genes spanned only 15.9 Mb (on chromosome 11) and 12.1 Mb (on chromosome 7), with both of Pax6 and Pax4 residing on the end of the shared gene arrays, respectively (Fig. 4). Our comprehensive survey of similar sequences in animals and molecular phylogenetic analysis detected additional paralogs that duplicated at the same evolutionary timing. Leucine-rich repeat containing 4B (LRRC4B) and Reticulocalbin 3 (RCN3) both on chromosome 19 were revealed to be paralogs of the genes identified above on chromosome 7 and 11 (Fig. 4; Fig. S5C and D). In addition, homeodomain interacting protein kinase 1 (HIPK1), paralogous to HIPK2 and HIPK3, was found on chromosome 1 (Fig. 4; Fig. S5E).

Comparison of non-coding regions of Pax4 and Pax6 genes

It seemed possible that some of expression domains shared between Pax4 and Pax6 genes (see Table S3) are driven by cis-regulatory elements shared between these two genes. To examine this, we downloaded genome sequences containing Pax4 and Pax6 genes in diverse vertebrates. We employed two different approaches to identifying non-coding sequences shared between Pax4-containing and Pax6-containing genomic regions (see Materials and Methods). However, both did not reveal any significant hit (data not shown).

We identified upstream non-coding sequences conserved within mammalian Pax4 (Fig. S6A), and within teleost fish pax4 (Fig. S6B). However, no non-coding sequences flanking Pax4 was revealed to be conserved between mammal Pax4 and teleost fish pax4 (Fig. S6A and B).


Pax4 and Pax6 repertoires in vertebrates

Our survey based on available large-scale genomic and transcriptomic sequences indicated the absence of Pax4 genes in sauropsids (birds and reptiles) and amphibians. It is very likely that Pax4 genes were lost in these lineages independently. We also failed to identify Pax4 genes in early vertebrates, such as chondrichthyans and cyclostomes, for which the Pax6 gene has already been reported. Interestingly, our phylogenetic analysis did not necessarily rule out the possibility that the dogfish and lamprey Pax6 sequences are orthologous to Pax4 (Fig. 3; Table S2). However, expressions of these early vertebrate Pax6 genes in the CNS (Murakami et al. 2001; Derobert et al. 2002), as well as a high level of conservation of amino acid sequences between them and osteichthyan Pax6 (Fig. S2), suggests their orthology to osteichthyan Pax6 genes. Taken together, Pax4 genes have only been identified in mammals and teleost fishes.

Phylogenetic origin of Pax4

Identification of Pax4 orthologs in teleost fishes supported the improbability of the scenario in Figure 1B, namely a gene duplication specific to the mammalian lineage. It was recognized very early that Pax6 sequences exhibit an extremely high level of sequence similarity among them, while those of Pax4 are very divergent (Balczarek et al. 1997). To accommodate this rate heterogeneity in the dataset, we mainly adopted the ML method which is known to be less prone to artifacts such as long branch attraction (Philippe et al. 2005). The analysis significantly supported the orthology of teleost pax4 to mammalian Pax4 (Fig. 3; Fig. S3; also see Results). However, regarding the timing of the Pax4-Pax6 split, our phylogenetic analysis did not provide unambiguous results (Table 1). It remained unclear which of the alternative hypotheses in Figure S4 (including those in Figure 1A and 1C) delineates the timing of the Pax4-Pax6 duplication. Since our dataset already contains representative species from the major chordate lineages, it does not seem likely that further identification of Pax4/6-related sequences will largely improve the resolution. The unreliable molecular phylogeny described so far urged us to focus on a different aspect of the evolution of Pax4 and Pax6 genes.

Genomic background of the Pax4-Pax6 duplication

To examine the timing of the duplication between Pax4 and Pax6, we referred to the chromosomal locations of these genes and their neighbors. By detecting similar arrays of genes shared between chromosomes (conserved synteny) in a genome and reconstructing the evolutionary history of the harbored gene families, we can date the timing of large-scale duplications. In the human genome, several quartets of chromosomes showing conserved synteny have been detected (Kasahara et al. 1996). Some of these served as initial convincing evidence of intra-genome duplications (Lundin 1993; Holland et al. 1994; Spring 1997). However, it is also expected that chromosomal rearrangements accelerated the decay of ancestral gene order during evolution. Although some effort has been made to reconstruct the ancestral vertebrate karyotype (Nakatani et al. 2007; Putnam et al. 2008), only a small fraction of all genes in sequenced genomes is implicated in those highly conserved syntenic regions.

Our analysis detected eight gene families whose members are co-localized inside 40 Mb genomic regions containing Pax4 and Pax6 on chromosome 7 and 11, respectively (Fig. 4). Except for only one case, molecular phylogenetic analyses suggested that the duplications between genes on chromosome 7 and 11 occurred early in vertebrate evolution (Fig. S5). This implies a large-scale duplication between these chromosomal regions. So far, no large-scale duplication event before the split between teleost and tetrapod lineages, other than the 2R-WGDs, has been documented (Van de Peer et al. 2009). Thus, it is likely that the Pax4-Pax6 split was caused by the 2R-WGDs early in vertebrate evolution (Fig. 1A).

Role of Pax4 and its evolutionary change

We showed that zebrafish pax4 is expressed in the developing pancreas and the stomodeum (Fig. 2). The pax4 expression in the pancreas, nested in the broader pax6b expression (Fig. 2, D-F), is concordant with the pattern in mouse, where Pax4 expression is restricted to β-cells, while Pax6 is expressed in all the four cell types of the endocrine pancreas (St-Onge et al. 1997; Biemar et al. 2001; Delporte et al. 2008). This similarity indicates their common ancestry at the base of the Osteichthyes.

Our comparison of non-coding genomic sequences containing Pax4 orthologs detected several conserved elements within mammals and within teleost fishes (Fig. S6). This included the only upstream enhancer characterized to date which is responsible for the pancreatic expression of Pax4 in mouse (Brink et al. 2001). However, none of these potential cis-regulatory elements were shared between mammals and teleost fishes with a comparable level of similarity (Fig. S6). Our intensive search for conserved non-coding elements shared between Pax4 and Pax6 also failed to detect potential cis-regulatory elements commonly retained between these duplicates (see Materials and Methods).

Expression in the stomodeum, the other pax4-positive domain in zebrafish, has never been described for mammalian Pax4 as well as for Pax6 genes. Thus, this expression domain should have been gained in the teleost fish lineage. On the other hand, expression in the pineal gland and the retina, described for mammals (Rath, Bailey, Kim, Coon et al. 2009; Rath, Bailey, Kim, Ho et al. 2009), was not detected in zebrafish (Fig. 2). Expressions in the retina and the pineal gland have also been reported for Pax6 in many vertebrates (Walther and Gruss 1991; Kawakami et al. 1997; Derobert et al. 2002; Navratilova et al. 2009). Interestingly, even the amphioxus Pax6 ortholog, AmphiPax6, is expressed in the lamellar body which is homologous to the pineal gland (Glardon et al. 1998). With a few exceptions [absence of zebrafish pax4 expression in the retina and pineal gland and absence of Xenopus Pax6 expression in the pineal gland (Hirsch and Harris 1997)], Pax4 and Pax6 genes are generally expressed in the retina and pineal gland, suggesting an ancient origin of these expression domains before the Pax4-Pax6 duplication.

While Pax4 and Pax6 seem to have retained a subset of expression domains, such as the pancreas, retina and pineal gland after the gene duplication, one striking feature of Pax4 is the absence of its expression in the central nervous system, including the eye and olfactory placode (Fig. 2; Table S3). Pax4 genes seem to have evolved relatively rapidly, based on long branches in molecular phylogenetic trees (Fig. 3 and S3), experienced more dynamic secondary modification of expression patterns, and may have been lost in the birds and amphibian lineages (Fig. 5). In contrast, Pax6 genes have highly conserved coding sequences (Fig. 3 and S3), experienced fewer changes in its highly pleiotropic expression, and have been retained in all species studied to date (Fig. 5). The asymmetric fates between Pax4 and Pax6 mark a potential of gene duplications to elaborate gene regulatory networks governing vertebrate embryogenesis.


This study was supported by the Young Scholar Fund, University of Konstanz to SK, the grants German Research Foundation (DFG) to SK (KU2669/1-1), Konstanz Research School Chemical Biology (KoRS-CB) to TM, and International Max Planck Research School (IMPRS) for Organismal Biology to NF. We thank Nicola Blum, Silke Pittlik, Adina J. Renz, Ursula Topel, and Elke Hespeler for technical support in cDNA cloning, handling of zebrafish embryos and in situ hybridization.