Oomycetes Comprise A Diverse Group Of Organisms Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Oomycetes comprise a diverse group of organisms that morphologically resemble fungi but belong to the stramenopile lineage within the supergroup of chromalveolates. Recent studies have shown that plant pathogenic oomycetes have expanded gene families that are possibly linked to their pathogenic lifestyle. We analyzed the protein domain organization of 67 eukaryotic species including four oomycete and five fungal plant pathogens. We detected 246 expanded domains in fungal and oomycete plant pathogens. The analysis of genes differentially expressed during infection revealed a significant enrichment of genes encoding expanded domains as well as signal peptides linking a substantial part of these genes to pathogenicity. Overrepresentation and clustering of domain abundance profiles revealed domains that might have important roles in host-pathogen interactions but, as yet, have not been linked to pathogenicity. The number of distinct domain combinations (bigrams) in oomycetes was significantly higher than in fungi. We identified 773 oomycete-specific bigrams, with the majority composed of domains common to eukaryotes. The analyses enabled us to link domain content to biological processes such as host-pathogen interaction, nutrient uptake, or suppression and elicitation of plant immune responses. Taken together, this study represents a comprehensive overview of the domain repertoire of fungal and oomycete plant pathogens and points to novel features like domain expansion and species-specific bigram types that could, at least partially, explain why oomycetes are such remarkable plant pathogens.

Domain Analysis in Oomycete Plant Pathogens


Oomycetes are a diverse group of organisms that live as saprophytes or as pathogens of plants, insects, fish, vertebrates, and microbes (Govers & Gijzen 2006). The numerous plant pathogenic oomycete species cause devastating diseases on many different host plants and have a huge impact on agriculture. A prominent example is Phytophthora infestans, the causal agent of late blight of potato (Solanum tuberosum) and tomato (Solanum lycopersicum) and responsible for the Irish potato famine in the 19th century. Plant pathogenic oomycetes include a large number of different species that vary in their lifestyle, from obligate biotrophic and hemibiotrophic to necrotrophic. In addition, they show great differences in host selectivity, ranging from broad to very narrow (Erwin & Ribeiro 1996; Agrios 2005). Oomycetes have morphological features similar to filamentous fungi, and the two groups exploit common infection structures and mechanisms (Latijnhouwers & Govers 2003). Together with diatoms, brown algae, and golden-brown algae, oomycetes are classified as stramenopiles, a lineage that is united with alveolates in the supergroup of chromalveolates (Baldauf et al. 2000; Yoon et al. 2002). The monophyly of this supergroup, however, is under debate (Baurain et al. 2010). The genomes of oomycetes sequenced so far are variable in size and content, ranging from 65 Mb in Phytophthora ramorum to 240 Mb in P. infestans (Haas et al. 2009), and only include plant pathogenic species. Analysis of these genomes revealed that several gene families facilitating the infection process are expanded (Martens et al. 2008). Extreme examples are gene families encoding cytoplasmic effector proteins such as RXLR effectors, which share the host cell-targeting motif RXLR and suppress defense responses in the host, and the necrosis-inducing proteins classified as Crinklers (Crn; Haas et al. 2009). To date, a few oomycete genomes have been sequenced, and this enables a comprehensive comparison of genomic features present in oomycetes, fungi, and other eukaryotic species such as gene families and protein domains. Experimentally derived functional knowledge of the majority of gene products in oomycetes in a comparable depth as for model species like Saccharomyces cerevisiae and Arabidopsis (Arabidopsis thaliana) will likely not be accessible in the near future. Hence, comparative genomics provides an important framework to functionally characterize oomycete gene products and generate hypotheses on the basic cellular functions as well as the complex interactions of these plant pathogens with their hosts and environment.

In this study, we focus on protein domains because these are the basic functional, evolutionary, and structural units that shape proteins (Rossmann et al. 1974; Orengo et al. 1997; Vogel et al. 2004). Domains function independently in single-domain proteins or synergistically in multidomain proteins (Doolittle 1995; Vogel et al. 2004; Bashton & Chothia 2007). Accordingly, some domains always occur with a defined set of functional partners, whereas others are highly versatile and form combinations of two consecutively occurring domains (also called bigrams) with different N- or C-terminal partners (Marcotte et al. 1999; Basu et al. 2008). Here, we analyzed the domain repertoire predicted from the genome sequences of 67 eukaryotic species and compared filamentous plant pathogens with other eukaryotes with a special emphasis on oomycetes. We show how differences in the domain repertoire of oomycetes, especially in the ex32 Chapter 2

pansion of certain domain families and the formation of species-specific bigram types, can be linked to the biology of this group of organisms. This allowed the generation of candidate sets of proteins and domains that are likely to play roles in the lifestyle of oomycetes or their interaction with plants.


The Domain Repertoire of Oomycete Plant Pathogens and its Comparison with Other Eukaryotes

We analyzed the domain architecture of the predicted proteomes in 67 eukaryotes covering all major groups of the eukaryotic tree of life with the exception of the supergroup Rhizaria (Figure 2-1A; Supplementary Table S2-1). We included seven strameno

Phytophthora infestans Phytophthora ramorum Phytophthora sojae Hyaloperonospora arabidopsidis Aureococcus anophagefferens Thalassiosira pseudonana Phaeodactylum tricornutum Haptophytes 1 species Alveolates 5 species Plants9 speciesExcavates 4 species Fungi 22 species Metazoans 19 species Chromalveolates 13 species Oomycetes 4 species Stramenopiles 7 speciesA B Phytophthora infestans Phytophthora ramorum Phytophthora sojae Hyaloperonospora arabidopsidis Eremothecium (Ashbya) gossypii Ustilago maydis Magnaporthe grisea Sclerotinia sclerotiorum Fusarium graminearum Oomycete plant-pathogens Fungal plant-pathogens Plant-pathogens biotroph hemi-biotroph necrotroph

Figure 2-1 Phylogenetic relationships of the analyzed species.

(A) The major eukaryotic groups considered in the analysis and the number of species represented in every group. For the exact species used in the analysis, see Supplemental Table S2-1. The tree is adapted from Simpson and Roger (2004) and incorporates the phylogeny for the stramenopiles based on Blair et al. (2008). (B) Fungal and oomycete plant-pathogenic species used in this analysis. The plant pathogens include species with different lifestyles, indicated by the symbol following the species name. The phylogeny for the fungi is based on James et al. (2006).33 Domain Analysis in Oomycete Plant Pathogens

piles, four of which are plant pathogenic oomycetes, namely the obligate biotrophic downy mildew Hyaloperonospora arabidopsidis and three hemibiotrophic Phytophthora species. The selection also contained five fungal plant pathogens, including rice (Oryza sativa) blast fungus (Magnaporthe grisea) and corn (Zea mays) smut (Ustilago maydis), both species with a (hemi)biotrophic lifestyle comparable to the oomycete plant pathogens used in the analysis (Figure 2-1B).

The domain architecture of all 1,250,996 predicted proteins in the 67 eukaryotic genomes was analyzed using HMMER (Eddy 1998) and a local Pfam-A database (Finn et al. 2010). Overall, 59% (737,851) of all proteins have one or more predicted domains. We detected a total of 1,464,807 domains in all species, 80,180 within the stramenopiles and 51,030 in oomycetes.

In order to characterize the domain repertoire of eukaryotes, we used two metrics:

N N N N N N N N C C C C C C C C Proteome species X Abundance domain type A domain type Bdomain type Cdomain type Ddomain type E N C N C N C bigrams: 2x(E|C) , 2x(E|E) , 2x(E|D) , 1x(C|D) , 1x(D|C), 1x(D|E), 1x(D|B), 1x(E|B), 1x(B|D) N C N C

Figure 2-2 Description of different metrics used in this study.

In the example shown, we observe five different domain types. The abundance of a domain type is defined as the number of occurrences of the individual entity within the species (e.g. domain type B has an abundance of two). The versatility is defined as the number of different direct adjacent N- or C-terminal neighbors. We distinguish between N- and C-terminal partners (e.g. the versatility of domain type C is three). A bigram is a set of two directly adjacent domains, and we also consider two entities of the same domain a bigram (e.g. we observe nine different bigram types in the proteome, of which three have an abundance of two (right panel)).34 Chapter 2

the number of domain types and the number of different combinations of adjacent domains, also called bigrams (Figure 2-2). In total, 13,994 bigram types were identified in the 67 eukaryotic genomes, consisting of 6,356 different domain types. As described by Basu et al. (2008), the number of bigram types increases superlinearly relative to the number of domain types, with the highest numbers in multicellular organisms (Figure 2-3). We observed separate clusters for metazoans, fungi, and plants (including land plants and mosses). Oomycetes and fungi have similar numbers of domain types, ranging from 2,000 to 2,500; however, oomycetes, in particular Phytophthora species, contain significantly more bigram types. The three analyzed Phytophthora species appeared to have approximately 50% more bigram types compared with other organisms that have similar numbers of domain types (Figure 2-3; P = 0.0019, by one-sided Wilcoxon rank-sum test). This even holds when we apply a more conservative approach by discarding all domain and bigram types that occur once in each predicted proteome (Supplementary Figure S2-1A). We observed that the number of domain types as well as the number of bigram types increases with proteome size and reaches saturation for larger proteomes (Supplementary Figure S2-1, B and C; Cosentino Lagomarsino et al. 2009).


Figure 2-3 Dependence of the number of domain and bigram types observed in the analyzed species.

The average number of different bigrams of species that have between 2,000 and 2,500 different domain types is indicated with the bottom horizontal red bar. The top horizontal red bar indicates the average number of different bigrams for Phytophthora species. The full species names corresponding to the abbreviations can be found in Supplemental Table S2-1. A magnification of the area encompassing the oomycete and fungal plant pathogens is shown; the species of interest are highlighted. The dots are colored according to the major eukaryotic groups as indicated in the text box.35 Domain Analysis in Oomycete Plant Pathogens

Although oomycetes and in particular Phytophthora species contain a similar number of domain types as fungi, they have a larger predicted proteome (Supplementary Figure S2-1B). However, they contain more bigram types than fungi but less than other species with predicted proteomes of similar size (e.g. Drosophila melanogaster; Supplementary Figure S2-1C).

Domain Overrepresentation Provides a Snapshot of Pathogen-Host Interaction

Apart from a wide and abundant repertoire of domains related to transposable elements (Haas et al. 2009), the most abundant domain types in oomycetes are similar to those in other eukaryotes (Supplementary Table S2-2). Hence, absolute domain abundance alone is not indicative enough to correlate domains to the lifestyle of both fungal and oomycete plant pathogens. Instead, we identified domains that are overrepresented in plant pathogens relative to other eukaryotes (Figure 2-1B).

Our analysis inferred 246 overrepresented domains in plant pathogens that are observed in 24,970 proteins (P < 0.001, by Fisher's exact test; a selection of well-described overrepresented domains is depicted in Figure 2-4A; Supplementary Table S2-3). Since we analyzed the expansion in plant pathogens at the level of a group rather than an individual species, domains that are reported as being expanded in the group are not necessarily expanded in all species of the group or may even be absent (Supplementary Table S2-3). For example, secreted proteins encoding carbohydrate-binding family 25 domains (IPR005085) are only found in Phytophthora species and not in fungal plant pathogens, whereas secreted proteins containing the Cys-rich domain (CFEM;

Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen Pathogen fold-increase(log2) 7.63x10-06* 2.41x10-08 4.69x10-157 4.32x10-36 3.72x10-61 2.43x10-108 3.89x10-180 Pathogen 2.50x10-29 0 1.25 2.50 3.75 5.00 157 (~0.2%) 34 (~0.07%) 98 (~0.1%)1127 (~1.3%) 228 (~0.3%) 34 (~0.05%) 65 (~0,1%) 98 (~0.1%) AB

Figure 2-4 Overrepresentation of selected, well-described domains involved in plant-pathogen interaction and establishing or maintaining infection.

(A) The log2-fold overrepresentation of the domains in plant pathogens is shown in the bar chart. The absolute number of occurrences in plant pathogens and the percentage of all predicted domains in plant pathogens are displayed in the bars, and the corrected P values are shown at the tip of the bars. The fold overrepresentation and the P value for the Kazal protease inhibitor domain were based on the overrepresentation in oomycetes com-pared with plant pathogens (indicated by the white bar and asterisks). (B) The overrepresented domains described in (A) are depicted in their possible cellular role during infection of the plant host.36 Chapter 2

IPR008427) are only observed in fungal pathogens (Kulkarni et al. 2003).

Many proteins involved in host-pathogen interaction are secreted in the apoplast or, like the RXLR effector proteins, translocated into host cells following their secretion from the pathogen (Haas et al. 2009). Hence, we also predicted the presence of potential N-terminal signal peptide sequences in the whole proteomes of the analyzed species. The combined secretome encompasses 100,521 potentially secreted proteins, of which 11,352 are predicted in plant pathogens (Supplementary Figure S2-2). Approximately 20% (2,478) of these proteins contain overrepresented domains; hence, proteins containing overrepresented domains are 1.85-fold enriched in the predicted secretome of the analyzed plant pathogens (P = 2.57 X 10-231, by Fisher's exact test).

Oomycete proteins with significantly expanded domains are prime candidates for being pathogenicity associated. To assess this hypothesis, we tested if P. infestans genes that are differentially expressed during infection of the potato host are enriched for the aforementioned expanded domains. For this, we utilized NimbleGen microarray data that include genome-wide expression levels of P. infestans genes at different days post inoculation (dpi) of potato leaves as well as from mycelium grown in vitro on different media (Haas et al. 2009). We identified in total 1,584 genes that are significantly induced or repressed in P. infestans during infection (differentially expressed for at least one of the time points 2-5 dpi) compared with those grown in vitro (three different growth media; P < 0.05, q < 0.05, by t test; Supplementary Table S2-4A). Of the 1,584 differentially expressed genes, 259 encode proteins containing significantly expanded domains (Supplementary Table S2-4B), which is 1.2-fold more than expected (P = 8.8 X 10-5, by Fisher's exact test). Moreover, 44 of these 259 genes also encode proteins with a predicted signal peptide, which is a significant enrichment (1.8-fold; P = 4.38 X 10-5, by Fisher's exact test). The majority (41) of these 44 genes are differentially expressed early in infection (2 dpi; Figure 2-5A). All genes differentially expressed at 3 dpi are also differentially expressed at 2 dpi (Figure 2-5, A and B). Consequently, the 44 differentially expressed genes coding for proteins with both predicted signal peptides as well as overrepresented domains are promising candidates for pathogenicity-associated proteins, of which several will be discussed in detail below.

For several groups of overrepresented domains, a direct or indirect role in host-pathogen interaction and/or plant pathogen lifestyle has already been hypothesized or demonstrated (Dean et al. 2005; Tyler et al. 2006; Haas et al. 2009). Nearly 18% of the 246 overrepresented domains belong to three groups of domains: (1) hydrolase domains; (2) domains involved in substrate transport over membranes, such as the general ATP-binding cassette (ABC) transporter-like domain (IPR003439) but also more specialized transporters of sulfate (IPR011547) and amino acids (IPR004841/ IPR013057); and (3) domains present in peptidases, such as the metalloprotease-type M28 domain (IPR007484) found in many secreted proteins. Of the hydrolases, which encompass 9% of the overrepresented domains, the majority is present in enzymes that hydrolyze glycosidic bonds. An example is the glycoside hydrolase (GH) family 12 domain (IPR002594). This domain is observed 34 times in plant pathogens, which overall 37 Domain Analysis in Oomycete Plant Pathogens

Figure 2-5 Gene expression analysis of P. infestans genes encoding overrepresented domains and a predicted N-terminal signal peptide.

Genes with significant gene expression changes at different time points after infection (2-5 dpi) relative to the expression intensities of different growth media are displayed (P < 0.05, q < 0.05, by t test). Heat maps show the significantly differentially expressed genes at different time points relative to growth media. Genes were clustered using Spearman rank correlation and average linkage clustering. Gene identifiers as well as domain descriptions are displayed. Gene expression profiles are displayed for the expression intensities relative to the average intensities of the growth media for each time point after infection. Heat maps and expression profiles of the significantly differentially expressed genes relative to the growth media are shown for individual time points as follows: 2 dpi (A), 3 dpi (B), 4 dpi (C), and 5 dpi (D). −2 0 2 A log2-fold −4 −2 0 2 4 Pea RS V8 2dpi 3dpi 4dpi 5dpi log2-fold −4 −2 0 2 4 Pea RS V8 2dpi 3dpi 4dpi 5dpi log2-fold −4 −2 0 2 4 Pea RS V8 2dpi 3dpi 4dpi 5dpi log2-fold −4 −2 0 2 4 Pea RS V8 2dpi 3dpi 4dpi 5dpi log2-fold GSM361599 GSM361600 GSM361601 GSM361602 GSM361603 GSM361604 GSM361630 GSM361631 GSM361632 GSM361633 GSM361634 GSM361635 GSM361636 GSM361637 Pea RS V8 2dpi 3dpi 4dpi 5dpi PITG_02369 - SDR PITG_02545 - Pectinesterase PITG_14013 - Cytochrome P450 PITG_08944 - GH12 PITG_04248 - Necrose inducing PITG_09716 - Necrose inducing PITG_16991 - GH12 PITG_21245 - GH28 PITG_09254 - Pectate lyase, catalytic PITG_17207 - Pectate lyase, catalytic PITG_19455 - GH28 PITG_14287 - Pectate lyase, catalytic PITG_13567 - GH81 PITG_03626 - Zinc finger, FYVE-type;Actin-binding FH2 PITG_16873 - Elicitin PITG_14108 - GCK;IQ calmodulin;WW/Rsp5/WWP PITG_15819 - Amidase signature enzyme PITG_20764 - FAD linked ox.;Berberine/berberine-like PITG_14238 - GH32, N- & C-terminal PITG_19936 - Necrosis inducing PITG_21247 - GH28 PITG_04325 - Pectinesterase PITG_05099 - LicD-like PITG_03525 - Alpha/beta hydrolase fold-1 PITG_19099 - WD40 PITG_14720 - Aldose-1-Epimerase PITG_07303 - Glucose-methanol-choline oxidored. PITG_20454 - Metallophosphoesterase PITG_02545 - Pectinesterase, catalytic PITG_14013 - Cytochrome P450 PITG_04123 - GH17 PITG_04248 - Necrosis inducing PITG_14237 - GH32, N- & C-terminal PITG_02369 - SDR PITG_15905 - GH3, N- & C-terminal PITG_16874 - Elicitin PITG_04158 - GH17 PITG_21463 - Pectate lyase PITG_10037 - SCP-like extracellular PITG_08563 - Pectate lyase PITG_08944 - GH12 PITG_04135 - GH17 PITG_02930 - FAD linked ox.; Berberine/berberine-like PITG_07720 - Metallophosphoesterase PITG_02928 - FAD linked ox.; Berberine/berberine-like PITG_13351 - Necrosis inducing PITG_09716 - Necrosis inducing PITG_16991 - GH12 GSM361599 GSM361600 GSM361601 GSM361602 GSM361603 GSM361604 GSM361630 GSM361631 GSM361632 GSM361633 GSM361634 GSM361635 GSM361636 GSM361637 Pea RS V8 2dpi 3dpi 4dpi 5dpi media -vs- 2 dpi PITG_12490 - Protease-ass. PA GSM361599 GSM361600 GSM361601 GSM361602 GSM361603 GSM361604 GSM361630 GSM361631 GSM361632 GSM361633 GSM361634 GSM361635 GSM361636 GSM361637 Pea RS V8 2dpi 3dpi 4dpi 5dpi PITG_10322 - Metallophosphoesterase PITG_18453 - Necrosis inducing GSM361599 GSM361600 GSM361601 GSM361602 GSM361603 GSM361604 GSM361630 GSM361631 GSM361632 GSM361633 GSM361634 GSM361635 GSM361636 GSM361637 Pea RS V8 2dpi 3dpi 4dpi 5dpi BCD media -vs- 3 dpi media -vs- 4 dpi media -vs- 5 dpi media -vs- 3 dpi media -vs- 4 dpi media -vs- 5 dpi media -vs- 2 dpi38 Chapter 2

contain 91,747 domains, and 43 times in all eukaryotes, which have a total of 1,464,807 domains, and hence is 12.62-fold (3.66 log2-fold) enriched in the plant pathogens. This domain is mainly observed in secreted proteins (27 out of 34; SignalP prediction). The majority (79%) of the GH-12 domains are found in oomycete plant pathogens, and the expression of two of these hydrolase genes in P. infestans (PITG_08944 and PITG_16991) is significantly induced during infection of potato (Figure 2-5; Supplementary Table S2- 4). In total, 33 differentially expressed genes during plant infection in P. infestans encode proteins that contain GH domains, including GH-17 (IPR000490) in endo-1,3-β- glucosidases and GH-81 (IPR005200) in β-1,3-glucanases as well as several members of GH-28 (IPR000743), a domain involved in soft rotting of host tissues and described in both fungal and bacterial plant pathogens (He & Collmer 1990; Ruttkowski et al. 1990). Twenty-eight P. infestans genes coding for domains involved in transmembrane transport are differentially expressed during plant infection (Supplementary Table S2-4). Examples of genes encoding domains involved in substrate transport over the membrane are PITG_04307, which encodes an ABC-2-type transporter (IPR013525), PITG_12808, which encodes an amino acid transporter (IPR013057), as well as PITG_22087, a gene encoding both ABC-like (IPR003439) and ABC-2-type domains (Supplementary Table S2- 4). Extracellular degrading enzymes like cutinases contain an overrepresented domain (IPR000675; P = 3.72 X 10-61). This domain is observed 65 times in plant pathogenic species, corresponding to a 13.3-fold (3.73 log2-fold) enrichment (Figure 2-4A). In total, 61 proteins in plant pathogens predicted to possess this domain are potentially secreted. Another overrepresented domain that is present in secreted proteins and involved in maceration and soft rotting of plant tissue is the pectate lyase (IPR004898). This domain is 15.34-fold (3.94 log2-fold) enriched in plant pathogens and mainly found in oomycetes. Five genes in P. infestans encode this domain as well as a predicted N-terminal signal peptide and are differentially expressed (Figure 2-5).

Novel Candidate Domains Significantly Expanded in Plant Pathogens

Next to domains that were already directly or indirectly implied in host-pathogen interaction, we identified novel candidates that are also expanded in plant pathogens, several of which are encoded in P. infestans genes differentially expressed during infection of the host. Genes encoding the significantly expanded alcohol dehydrogenase (zinc binding; IPR013149) as well as a GroES-like alcohol dehydrogenase (IPR013154) domains are ubiquitous in all analyzed eukaryotes, and also the combination of these two domains is present in all species with only a few exceptions. Nine of these genes in P. infestans are induced during infection (Supplementary Table S2-4). Sixty-five genes in plant pathogens encode proteins with FAD-linked oxidase (IPR006094) and berberine/ berberine-like (BBE) domains (IPR012951), of which three out of six in P. infestans are induced during infection (PITG_02928, PITG_02930, and PITG_20764). The BBE domain is involved in the biosynthesis of the alkaloid berberine (Facchini et al. 1996). The genes encode a predicted N-terminal signal peptide, although molecular analysis of proteins containing these domains in plants indicated that at least some of these are not secreted but instead are targeted to specialized vesicles (Amann et al. 1986; Kutchan & Dittrich 1995; Facchini et al. 1996). Moreover, Moy et al. (2004) observed induced expression 39 Domain Analysis in Oomycete Plant Pathogens

of a soybean (Glycine max) gene (BE584185) shortly after infection with Phytophthora sojae containing these two domains. A recent analysis from Raffaele et al. (2010) focusing solely on the secretome in P. infestans corroborates our results and also concludes that proteins with BBE and FAD-linked oxidase domains are candidate virulence factors. Three genes encoding secreted metallophosphoesterases (IPR004843; PITG_20454, PITG_07720, and PITG_10322) show induced gene expression. These metallophosphoesterase domains are found in phosphatases and hence are involved in the regulation of protein activity, since they work as antagonists of kinase activity.

For approximately 6% of all overrepresented domains, no or limited functional information is available in Pfam. These are the so-called DUFs: domains of unidentified function. Given their expansion in plant pathogens and the fact that other overrepresented domains are known to function in diverse aspects of plant-pathogen interactions, these DUFs are also likely to play a role in the lifestyle of plant pathogens and hence are promising targets for further experimental validation (Supplementary Table S2-3). Secreted proteins containing a combination of two overrepresented DUFs, DUF2403 (IPR018807) and DUF2401 (IPR018805), are exclusively found in fungi and in oomycetes, with the majority (approximately 75%) in oomycetes. The N-terminal DUF2403 contains a Gly-rich region without further functional annotation, whereas five highly conserved Cys residues characterize the C-terminal DUF2401. Proteins containing both DUFs have been characterized in S. cerevisiae and in Candida albicans as being covalently linked to the cell wall (Terashima et al. 2002; Yin 2005; Klis et al. 2009). Another overrepresented DUF within plant pathogens and mainly found in oomycetes is DUF953 (IPR010357). This domain is present in several eukaryotic proteins with thioredoxin-like function, and two genes in P. infestans containing this domain are differentially expressed during infection (PITG_07008 and PITG_07010). DUF590 (IPR007632), which is ubiquitous in nearly all eukaryotes, is observed in proteins containing eight putative transmembrane helices. These proteins exhibit calcium-activated ion channel activity and are involved in diverse biological processes (Yang et al. 2008). The P. infestans gene PITG_06653 that contains the DUF590 domain is differentially expressed during infection, and this provides further support for a role in host-pathogen interaction. The exemplified DUFs as well as other overrepresented domains with less or no functional annotation are interesting candidates for further functional studies to decipher their precise role in plant pathogens.

Domain Overrepresentation in Oomycete Plant Pathogens

Since the previous analysis grouped both fungal and oomycete plant pathogens, domains specifically enriched in oomycetes were not directly discernible. Hence, we compared the relative domain abundance predicted in plant pathogens (Figure 1B) with the aim to identify domains specifically enriched in oomycetes. Of the 75 domains that are overrepresented in oomycetes, 20 are not observed in any fungal plant pathogen and therefore can be considered oomycete specific within plant pathogens (Supplementary Table S2-5). In general, the abundance of expanded domains in Phytophthora species is higher than in H. arabidopsidis. A well-described example is the NPP1 do40 Chapter 2

main (IPR008701) that is present in secreted (SignalP: 122) necrosis-inducing proteins. It shows a significant overrepresentation in oomycetes (1.68-fold [0.75 log2-fold] enriched), in particular in Phytophthora species, but is also observed 10 times in fungal plant pathogens as well as in a few cases in nonpathogenic fungi as noted before (Gijzen & Nürnberger 2006). Four P. infestans genes encoding this domain are induced early during infection (2-3 dpi), whereas a single gene (PITG_18453) is induced late (5 dpi). Several peptidases (e.g. containing the peptidase S1/S6 and C1A domains) are overrepresented compared with other plant pathogens. S1/S6 (IPR001254; 1.6-fold [0.74 log2-fold]) is predicted in 91 proteins, of which 67 have a predicted secretion signal, while C1A (IPR000668; 1.79-fold [0.85 log2-fold]) is predicted in 78 proteins, of which 31 are potentially secreted. C1A is present in several eukaryotic species, but within the plant pathogenic group it is exclusively found in oomycetes.

Several secreted protease inhibitors of the Kazal family containing the Kazal I1 (IPR002350) and Kazal-type (IPR011497) domains are significantly expanded in oomycetes and are within the group of analyzed plant pathogens specific to oomycetes. This suggests that they provide an increased level of protection of the pathogen against host-encoded defense-related proteases (Tyler et al. 2006). Another domain that is oomycete specific within the plant pathogens is the Na/Pi cotransporter (IPR003841) involved in the uptake of phosphate. Several other transporters that have already been described as being overrepresented in plant pathogens (e.g. the ABC-2-type transporters) are significantly expanded within oomycete plant pathogens, since these species are the major contributors to the overall abundance of this domain in plant pathogens. The abundance of predicted Ser/Thr-like kinase domains (IPR017442) compared with other plant pathogenic species is surprisingly high, and this domain is specifically expanded in the Phytophthora species. Even if several expanded domains are observed in both oomycete as well as fungal plant pathogens, the exploration of domains primarily expanded in oomycetes (e.g. certain transporter families and defense- and signaling-related domains) highlights functional entities that discriminate between these groups of plant pathogens.

Clustering of Abundance Profiles Reveals Additional Potential Pathogenicity Factors

We extended the set of candidate domains that might be important for host-pathogen interaction beyond overrepresented domains by searching for additional domains that show presence, absence, and expansion profiles similar to overrepresented domains, since these domains are likely to be functionally linked or involved in similar biological processes (Pellegrini et al. 1999). We calculated a normalized profile of domain abundance and clustered similar abundance profiles using hierarchical clustering (Supplementary data S2-1). Several clusters contained a mix of significantly overrepresented domains and domains whose expansion in plant pathogens is not significant. We exemplify this with three clusters that contain 20% of all overrepresented domains in plant pathogens (Figure 2-6).41 Domain Analysis in Oomycete Plant Pathogens

In the first cluster (Figure 2-6), domains are mainly expanded in oomycete plant pathogens. The abundance of some domains in plant pathogens is too low to be identified as being overrepresented. For example, the PcF domain (IPR018570), which is present in a small, approximately 50-amino acid necrosis-inducing protein found in various Phytophthora species (Orsomando et al. 2001; Liu et al. 2005), was not identified in the initial overrepresentation analysis. Also in this cluster is the sugar fermentation stimulation domain (IPR005224), which is mainly found in bacteria and involved in the regulation of maltose metabolism (Kawamukai et al. 1991). In this first cluster, we observed a high number (approximately 40%) of domains without functional characteriza

DICDIS MONBRE TRIADH NEMVEC CAEELE LOTGIG DANPUL ANOGAM DROMEL STRPUR CIOINT BRAFLO DANRER FUGRUB XENTRO GALGAL ORNANA HOMSAP MUSMUS ENCCUN BATDEN PHYBLA RHIORY USTMAY CRYNEO LACBIC PHACHR SCHPOM SCLSCL ASPTER COCIMM FUSGRA MAGGRI NEUCRA YARLIP DEBHAN CANGUI EREGOS KLULAC CANGLA SACCER CYAMER OSTTAU CHLREI VOLCAR PHYPAT SELMOE ORYSAT ARATHA POPTRI AURANO PHYSOJ PHYINF PHYRAM HYAPAR PHATRI THAPSE EMIHUX PARTET TETTHE CRYPAR PLAFAL THEPAR NAEGRU GIAINT LEIMAJ TRYBRU IPR000415|PF00881|Nitroreductase IPR003806|PF02655|ATP-grasp fold, DUF201-type IPR009959|PF07366|Protein of unknown function DUF1486 IPR003836|PF02685|Glucokinase IPR005114|PF03457|Helicase-associated IPR003442|PF02367|Uncharacterised protein family UPF0079, ATPase bacteria IPR000397|PF01430|Heat shock protein Hsp33 protein IPR005322|PF03577|Peptidase C69, dipeptidase A IPR007494|PF04399|Glutaredoxin 2, C-terminal IPR018461|PF03553|Na+/H+ antiporter NhaC-like IPR005224|PF03749|Sugar fermentation stimulation protein IPR002549|PF01594|Uncharacterised protein family UPF0118 IPR014867|PF08757|Spore coat protein CotH IPR007571|PF04482|Protein of unknown function DUF564 IPR013702|PF08495|FIST domain, N-terminal IPR010323|PF06041|Protein of unknown function DUF924, bacterial IPR008619|PF05594|Filamentous haemagglutinin, bacterial IPR005583|PF03883|Protein of unknown function DUF328 IPR006059|PF01547|Bacterial extracellular solute-binding, family 1 IPR005085|PF03423|Carbohydrate binding family 25 IPR010766|PF07085|DRTGG IPR018570|PF09461|Phytotoxin, PcF IPR015269|PF09186|Region of unknown function DUF1949 IPR002200|PF00964|Elicitin IPR002822|PF01969|Protein of unknown function DUF111 IPR015396|PF09317|Protein of unknown function DUF1974 IPR004616|PF03588|Leucyl/phenylalanyl-tRNA-protein transferase IPR004740|PF03825|Nucleoside:H+ symporter IPR005526|PF03775|Septum formation inhibitor MinC, C-terminal IPR001123|PF01810|Lysine exporter protein (LYSE/YGGA) IPR007607|PF04519|Protein of unknown function DUF583 IPR005651|PF03966|Protein of unknown function DUF343 IPR007511|PF04417|Protein of unknown function DUF501 -|PF09818|- IPR010499|PF06445|Bacterial transcription activator, effector binding IPR012545|PF08002|Protein of unknown function DUF1697 IPR007560|PF04471|Restriction endonuclease, type IV-like, Mrr IPR003345|PF02370|M protein repeat 0.00 0.01 0.02 0.02 0.03 0.04 0.04 0.05 0.06 0.07 0.07 0.08 0.09 0.09 0.10 IPR000254|PF00734|Cellulose-binding region, fungal -|PF10282|- IPR008701|PF05630|Necrosis inducing IPR004898|PF03211|Pectate lyase, catalytic IPR016288|PF01341|1, 4-beta cellobiohydrolase IPR011118|PF07519|Tannase and feruloyl esterase IPR000675|PF01083|Cutinase IPR002594|PF01670|Glycoside hydrolase, family 12 IPR015364|PF09284|Rhamnogalacturonase B, N-terminal IPR001722|PF00840|Glycoside hydrolase, family 7 IPR011683|PF07745|Glycosyl hydrolase 53 IPR006710|PF04616|Glycoside hydrolase, family 43 IPR018535|PF09362|Domain of unknown function DUF1996 IPR000627|PF00775|Intradiol ring-cleavage dioxygenase, C-terminal IPR015289|PF09206|Alpha-L-arabinofuranosidase B, catalytic IPR008902|PF05592|Bacterial alpha-L-rhamnosidase IPR010905|PF07470|Glycosyl hydrolase, family 88 IPR000383|PF02129|Peptidase S15 IPR013736|PF08530|Peptidase S15/CocE/NonD, C-terminal IPR002765|PF01906|Uncharacterised protein family UPF0145 IPR000073|PF00561|Alpha/beta hydrolase fold-1 IPR013525|PF01061|ABC-2 type transporter IPR008183|PF01263|Aldose 1-epimerase IPR005132|PF03330|Rare lipoprotein A IPR003480|PF02458|Transferase IPR000070|PF01095|Pectinesterase, catalytic IPR000743|PF00295|Glycoside hydrolase, family 28 IPR002022|PF00544|Pectate lyase/Amb allergen IPR001000|PF00331|Glycoside hydrolase, family 10 IPR013094|PF07859|Alpha/beta hydrolase fold-3 IPR013149|PF00107|Alcohol dehydrogenase, zinc-binding IPR013154|PF08240|Alcohol dehydrogenase GroES-like IPR008030|PF05368|NmrA-like IPR006094|PF01565|FAD linked oxidase, N-terminal IPR012951|PF08031|Berberine/berberine-like IPR004276|PF03033|Glycosyl transferase, family 28 IPR001764|PF00933|Glycoside hydrolase, family 3, N-terminal IPR002772|PF01915|Glycoside hydrolase, family 3, C-terminal IPR001547|PF00150|Glycoside hydrolase, family 5 IPR001568|PF00445|Ribonuclease T2 IPR003594|PF02518|ATP-binding region, ATPase-like IPR001789|PF00072|Signal transduction response regulator, receiver region IPR003661|PF00512|Signal transduction histidine kinase, subgroup 1, dim. and phosph. r IPR003864|PF02714|Protein of unknown function DUF221 IPR001765|PF00484|Carbonic anhydrase IPR002925|PF01738|Dienelactone hydrolase IPR001155|PF00724|NADH:flavin oxidoreductase/NADH oxidase, N-terminal IPR004183|PF02900|Extradiol ring-cleavage dioxygenase, class III enzyme, subunit B IPR002933|PF01546|Peptidase M20 IPR013108|PF07969|Amidohydrolase 3 IPR007402|PF04305|Protein of unknown function DUF455 IPR000890|PF00871|Acetate and butyrate kinase

Figure 2-6 Average linkage clustering of normalized domain profiles using Spearman rank correlation as a distance measurement.

The species tree for all eukaryotic species is depicted on top, with the color code of their supergroup as introduced in Figure 2-1. Plant pathogens are marked with stars, and the arrowheads highlight domains identified as overrepresented in plant pathogens.42 Chapter 2

tion that are mainly present in bacteria. An example is DUF1949 (IPR015269), a domain that is only found in the three analyzed Phytophthora species. This domain is observed in functional uncharacterized bacterial proteins like YIGZ in Escherichia coli K12 and adopts a ferredoxin-like fold (Park et al. 2004). The Phytophthora and bacterial proteins containing DUF1949 also contain a second, N-terminal uncharacterized protein family, UPF (UPF00029, IPR001498). This domain is also found in the human protein Impact and is conserved from bacteria to eukaryotes (Okamura et al. 2000). The P. infestans gene (PITG_00027) containing both domains is induced early in infection (Supplementalry Table S2-4B). Since these DUFs cluster with overrepresented domains, they are promising candidates for further study.

The domains in the second cluster mainly show an expansion of the abundance in both fungal and oomycete plant pathogens. This cluster contains, for example, cell wall-degrading domains like cutinases, pectate lyases, and other hydrolases and also the NPP1 domain that is found in necrosis-inducing proteins. The glycosyl hydrolase family 88 comprises unsaturated glucuronyl hydrolases thought to be involved in biofilm degradation and is mainly found in bacteria and fungi (Itoh et al. 2006). Interestingly, homologs are also observed in plant pathogenic bacteria (e.g. Pectobacterium atrosepticum), in fungi (e.g. M. grisea), and in all three Phytophthora species.

The third cluster contains domains that are not exclusively found in plant pathogens but have a broader abundance profile. This cluster includes a variety of overrepresented hydrolases, epimerases, and the ABC-2-type transporter domain (IPR013525) that is observed nearly 500 times in plant pathogenic species. Another domain that is found in this cluster is the dienelactone hydrolase domain (IPR002925), observed in all plant pathogens and also in other eukaryotic species, with a high abundance in plants as well as in fungi. This domain hydrolyzes dienelactone to maleylacetate in bacteria (Pathak et al. 1991) and is also detected in a putative 1,3:1,4-β-glucanase from P. infestans that is proposed to be involved in cell wall metabolism (McLeod et al. 2003).

Quantification of Oomycete-Specific Bigrams

Domains generally do not act as single entities in proteins but rather synergistically with other domains in the same protein or with domains in interacting proteins (Park et al. 2001; Vogel et al. 2004). Domains involved in signaling, sensing, and generic interactions are versatile and form combinations with several different partner domains (Supplementary Table S2-6). As described by others (Vogel et al. 2005), we observed that the versatility of domains is proportional to their abundance (Supplementary Figure S2-3). Hence, we applied a weighted bigram frequency that corrects for abundance to detect domains that are promiscuous or prone to form combinations with different partners (Basu et al. 2008). The average number of promiscuous domains in oomycetes is 424 and in Phytophthora is 464. This is higher than the average number of promiscuous domains (357) in all other species (Supplementary Table S2-7).

We observed that oomycetes have a higher number of bigram types than species 43 Domain Analysis in Oomycete Plant Pathogens

with a comparable number of domain types (Figure 2-3). We identified in total 13,994 different bigram types throughout the 67 analyzed species. The majority of these bigram types (i.e. 7,724, or 55.2%) are predicted in only a single species. In oomycetes, bigram types formed by domains that are associated with transposable elements showed a high abundance (Supplementary Tables S2-8 and S2-9). We identified 1,107 bigram types occurring exclusively in plant pathogens, the majority of which (773) are only observed in the analyzed oomycetes (Supplementary Table S2-10). These oomycete-specific bigram types are identified in total 1,511 times in 1,375 predicted proteins. Of the 773 oomycete-specific bigram types, 53 are present in all oomycetes (Figure 2-7A). The biggest overlap in oomycete-specific bigram types is observed between the Phytophthora species, especially between P. ramorum and P. sojae. A recent analysis of domain combination in P. ramorum and P. sojae already revealed several proteins involved in metabolism and regulatory networks containing novel bigrams (Morris et al. 2009). We additionally observed in total 43 bigram types that are shared either between P. infestans and P. sojae or between P. infestans and P. ramorum. However, the majority of oomycete-specific bigrams (467) are specific for a single species. The number of oomycete-specific bigram types highly exceeds the number of oomycete-specific domain types (41). Interestingly, only six of the oomycete-specific domains participate in forming the specific bigrams. Therefore, common domain types form the majority of the observed species-specific domain combinations, emphasizing the importance of novel domain combinations rather than novel domain types as a source for species-specific functionality. Even when we selectively look at the bigrams that occur at least twice in the same proteome or once in at least two different proteomes, we still observe 320 bigram types that are specific to oomycetes and occur in 982 predicted proteins.

Approximately 8% of the proteins containing an oomycete-specific bigram have a predicted secretion signal (9.2% of all oomycete proteins contain a predicted secretion signal). An example that is observed in a secreted putative Cys protease present in all analyzed oomycetes is the combination of the peptidase C1A domain (IPR000668) and the ML domain (IPR003172). The ML domain is known to be involved in lipid binding and innate immunity and has been observed in plants, fungi, and animals (Inohara & Nuñez 2002). The proteins containing this bigram also have an N terminal cathepsin inhibitory domain (IPR013201) that is often found next to the peptidase C1A domain and prevents access of the substrate to the binding cleft (Groves et al. 1996). Another bigram that is found in secreted proteins predicted in the analyzed Phytophthora species is the combination of the carbohydrate-binding domain family 25 (IPR005085; CBM25) with a GH-31 domain (IPR000322) as well as the tandem combination of CBM25 domains Nterminal to the glycosyl hydrolase domain. The presence of the secreted CBM25 and GH-31 combination has recently been noted in Pythium ultimum (Lévesque et al. 2010). We further tried to elucidate the presence of RXLR or Crn motifs in proteins containing oomycete-specific bigrams. We predicted the presence of one of these motifs using individual HMMER models for both the RXLR and the Crn motif (see Materials and Methods). We overall predicted 746 proteins containing an RXLR and 99 proteins with a Crn motif. None of these proteins is predicted to contain an oomycete-specific bigram type.44 Chapter 2

The most abundant oomycete-specific bigram type that occurs in 64 proteins is a combination of the phosphatidylinositol 3-phosphate-binding zinc finger (FYVE type) and the GAF domain. The presence of this oomycete-specific bigram in P. ramorum and P. sojae has been noted before (Morris et al. 2009). The GAF domain is described as one of the most abundant domains in small-molecule-binding regulatory proteins (Zoraghi et al. 2004). It is present in a large number of different proteins with a wide range of cellular functions, such as gene regulation (Aravind & Ponting 1997) and light detection and signaling (Sharrock & Quail 1989; Montgomery & Lagarias 2002). A typical eukaryotic domain composition involving the GAF domain is N terminal to the 3' 5'-cyclic phosphodiesterase domain found in phosphodiesterases that regulate pathways with cyclic nucleotide-monophosphate as second messengers (Sharrock & Quail 1989; Martinez et

Figure 2-7 Quantification of oomycete-specific bigrams

(A) Venn diagram depicting the presence of oomycete-specific bigram types in the analyzed oomycete proteomes and indicating the number of shared bigram types between different proteomes. The total number of oomycete-specific bigram types in each proteome is shown in parentheses. The Venn diagram was produced using Venny (Oliveros, 2007). (B) Domain architecture of example proteins containing a GAF domain. The top two architectures resemble common protein architectures: the cGMP-dependent 3',5'-cyclic phosphodiesterase (observed 111 times in eukaryotes and five times in oomycetes) and phytochrome A (observed 21 times in eukaryotes). The bottom two architectures depict oomycete-specific architectures: the FYVE-GAF fusion is observed 53 times independent of other domains, and the myosin motor head in combination with the FYVE-GAF fusion is observed four times, a single copy in each of the oomycetes included in this study. aa, amino acids. (C) Simplified evolutionary tree based on the phylogenetic analysis of the GAF domain in prokaryotes and eukaryotes. GAF domains from proteins with a FYVE-GAF fusion are exclusively found to be close to bacterial GAF domains. Other oomycete proteins that only contain the GAF domain without the FYVE domain also cluster with other eukaryotic sequences. CB Oomycetes Bacteria Eukaryotes + oomycetes (409) (464) (305) (94) A PAS fold-2 (IPR013654) Zinc finger (FYVE-type) (IPR000306) GAF (IPR003018) Phytochrome, central region (IPR013515) Signal transduction histidine kinase, phosphreceptor domain (IPR003661) ATPase-like, ATP binding (IPR003594) 3'5'-cyclic phosphodiesterase (IPR002073) Myosin head, motor domain (IPR001609) PAS fold (IPR013767) 111(5) 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa 21 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa 53 100 aa 100 aa 100 aa 100 aa 100 aa 4 100 aa 100 aa 100 aa 100 aa 100 aa 100 aa universal architectures oomycete-specific architectures45 Domain Analysis in Oomycete Plant Pathogens

al. 2002). This organization is observed in total 111 times, and five times in oomycetes (Figure 2-7B). The GAF-FYVE bigram is either observed as a single bigram (in 53 proteins) or in combination with other domains (in 11 proteins), for example with myosin (Richards & Cavalier-Smith 2005). In P. infestans, two genes (PITG_07627 and PITG_09293) encoding proteins with this combination are induced early during infection of the plant (Supplementary Table S2-4B). A phylogenetic analysis of the GAF domain in eukaryotes and prokaryotes showed that all GAF domains in oomycetes that are involved in the fusion with FYVE exclusively cluster with prokaryotic GAF domains, whereas other GAFs also cluster with eukaryotes. Hence, this suggests a horizontal gene transfer from bacteria to oomycetes of those GAF domains that are involved in the fusion with FYVE (Figure 2-7C; see Materials and Methods). The FYVE-type zinc finger is not identified in prokaryotic species; hence, we suggest two independent events, namely a horizontal gene transfer of the GAF domain from bacteria to oomycetes and subsequently a fusion to the zinc finger domain. Horizontal gene transfer seems to play an important role in the evolution of eukaryotes (Keeling & Palmer 2008), and recent evidence indicates that these events also have a significant contribution to the genome content of protists and oomycetes, as they received genetic material from different sources (Richards & Talbot 2007; Martens et al. 2008; Morris et al. 2009). Because GAF domains are known to be involved in many different cellular processes, we can only speculate about the biological function of proteins harboring the GAF-FYVE bigram. A possible function is the targeting of proteins to lipid layers by the zinc finger domain in response to second messengers sensed by the GAF domain.

Several domains involved in the phospholipid signaling were found to be overrepresented in the filamentous plant pathogens and in particular in oomycetes. These included the phosphatidylinositol 3-/4-kinase, PIK (IPR000403), the phosphatidylinositol 4-phosphate 5-kinase domain, PIPK (IPR002498), as well as the phosphatidylinositol 3-phosphate-binding FYVE. Novel domain compositions in proteins involved in phospholipid signaling and metabolism in Phytophthora species have been reported previously (Meijer & Govers 2006). Signaling domains like the FYVE and the PIK, as well as domains like the IQ-calmodulin-binding domain (IPR000048) and the phox-like domain (IPR001683), form highly abundant oomycete-specific bigram types (Supplementary Table S2-10). Moreover, other domains, like the Ser/Thr protein kinase-like (IPR017442), pleckstrin homology (IPR001849), and DEP (IPR000591) domains, are involved in several oomycete-specific bigram types (e.g. the DEP-Ser/Thr protein kinase-like domain fusion is predicted in the proteomes of all analyzed oomycetes). Additionally, domains that are components of the histone acetylation-based regulatory system form oomycete-specific bigrams, such as the AP2 (IPR001471) and the histone deacetylase (IPR000286) domain combination (Iyer et al. 2008), which is observed in P. ramorum as well as in P. sojae.


We predicted the domain repertoire encoded in the genomes of four oomycete 46 Chapter 2

plant pathogens and compared it with a broad variety of eukaryotes spanning all major groups, including several fungal plant pathogens that have a similar morphology, lifestyle, and ecological niche as oomycete plant pathogens. We quantified and examined domain properties observed in oomycetes and especially emphasized differences and common themes within fungal and oomycete plant pathogens and their probable contribution to a pathogenic lifestyle.

We observed that oomycete plant pathogens, in particular Phytophthora species, have significantly higher numbers of unique bigram types compared with species with a similar number of domain types (Figure 2-3). However, oomycetes also have on average 50% more predicted genes than most of the analyzed fungi, but at the same time they encode a comparable number of domain types and hence exhibit similar domain diversity (Supplementary Figure S2-1B). The high number of genes observed in oomycetes suggests enlarged complexity compared with fungi, which is not directly obvious from the domain diversity but instead from the number of unique bigram types (Supplementary Figure S2-1C). This observation has two possible explanations: (1) the larger number of genes predicted from oomycete genomes provides the flexibility to form new domain combinations based on a limited set of already existing domains that are in quantities similar to fungi; (2) the domain models that cover specific domains are incomplete and therefore do not provide the required sensitivity for oomycete genomes. Hence, we would underestimate the number of observable domain types (and to a certain extent the number of predicted bigram types). Additionally, oomycetes, especially Phytophthora species, are no longer following the observed trend that organisms with a higher number of genes (proteins) contain a larger number of domain types. Consequently, they are shifted when comparing the number of predicted domain and bigram types. Nevertheless, both possible explanations and the observed numbers allow us to conclude that oomycete genomes, especially Phytophthora species, harbor a large repertoire of genes encoding different bigram types compared with species of comparable complexity and, in the case of filamentous fungi, even similar morphology.

Oomycetes and fungal plant pathogens seem to be very similar to other eukaryotes with respect to absolute domain abundance (Supplementary Table S2-2), and this metric is hence not sufficiently indicative to correlate domains directly or indirectly with the pathogenic lifestyle. Therefore, we predicted overrepresented domains in plant pathogens and identified 246 domains that are significantly expanded (Supplementary Table S2-3). Proteins containing overrepresented domains are significantly enriched in the predicted secretome of the analyzed plant pathogens, corroborating the idea that expanded domain families are involved in host-pathogen interaction and that these proteins are mainly acting in the extracellular space. It has to be noted that the presence of a predicted signal peptide does not necessarily mean that these proteins are found extracellularly, since some proteins are retained in the endoplasmic reticulum/Golgi and hence are not secreted (Bendtsen et al. 2004).

Since we anticipate that proteins that are directly involved in host-pathogen interaction are differentially regulated upon infection, we utilized the NimbleGen microarray 47 Domain Analysis in Oomycete Plant Pathogens

data of P. infestans (Haas et al. 2009) and identified 259 induced/repressed genes encoding proteins containing overrepresented domains. Genes containing overrepresented domains are significantly enriched within the set of differentially expressed genes containing a predicted domain. Moreover, this subset contains a significantly higher abundance of genes with a predicted N-terminal signal peptide than expected. These observations highlight and corroborate the initially emerging link between domain expansion and host-pathogen interaction.

The majority of the 246 expanded domains are present in proteins that are involved in general carbohydrate metabolism, nutrient uptake, signaling networks, and suppression of host responses and hence might contribute to establishing and maintaining pathogenesis (Figure 2-4). The variety of overrepresented domains involved in substrate transport over membranes is of special interest. Filamentous plant pathogens and especially oomycetes exhibit a complex and expanded repertoire of these domains, enabling them to absorb nutrients from their environment and host. The expression of P. infestans genes encoding ABC-2-like transporters, amino acid transporters, and Na/Pi cotransporter is induced early in infection of the plant, suggesting that these proteins act during the biotrophic phase of infection. Several other genes encoding proteins with a predicted extracellular localization are induced during infection and contain overrepresented domains. For example, three P. infestans genes encoding the predicted N-terminal signal peptide as well as FAD-linked oxidase and BBE domains are induced during infection. The BBE domain is involved in the biosynthesis of the alkaloid berberine (Facchini et al. 1996). Moy et al. (2004) showed that a soybean homolog of this gene is induced after infection with P. sojae. Molecular studies of proteins containing BBE domains in plants have indicated that several proteins containing these domains are in fact not secreted but instead targeted to specific alkaloid biosynthetic vesicles where the proteins accumulate (Amann et al. 1986; Kutchan & Dittrich 1995; Facchini et al. 1996). The expansion of domain families with potential direct or indirect roles in host-pathogen interaction in filamentous plant pathogens strongly suggests adaptation to their lifestyle at the genomic level.

In addition to known domains, the set of overrepresented domains also revealed domains that, as yet, have not been implicated in pathogenicity nor are functionally characterized. An example is the DUF953 domain, which, within plant pathogens, is mainly found in oomycetes. This domain is observed in eukaryotic proteins with a thioredoxin-like function, and P. infestans genes encoding these domains are differentially expressed during infection. The significant expansion of these domains in plant pathogens, and the fact that other well-described domains with a function in plant pathogenicity are also overrepresented, make proteins encoding poorly described but expanded domains interesting candidates to decipher their role in filamentous plant pathogens in general and oomycetes in particular.

We determined domain overrepresentation on the basis of species groups (plant pathogens and oomycetes) rather than on the level of individual species. We are aware that, as a consequence of this approach, we might have identified domains as being 48 Chapter 2

overrepresented in one group even if they do not need to be present or expanded in all the members (Supplementary Tables S3 and S5). Hence, we might falsely extrapolate the functional role of a domain in a subset of species to the whole group (e.g. a domain that is exclusively found in plant pathogenic fungi and not in oomycetes would still be overrepresented in the plant pathogenic group). Especially when comparing oomycete with fungal plant pathogens, the dominant expansion of domain families within Phytophthora species over families in H. arabidopsidis might bias the inferred overrepresented domain (Supplementary Table S2-5). Since we in general want to identify candidate domains that might be directly or indirectly involved in host-pathogen interaction, either at the level of filamentous plant pathogens or oomycetes, we think our group-based approach is appropriate to establish a set of candidate proteins and domains.

Moreover, the clustering of presence, absence, and expansion patterns of domains known or implicated to be involved in a plant pathogenic lifestyle with domains that have no known or direct connection to host-pathogen interactions aids in expanding this set of novel candidate domains (Figure 2-6). For example, DUF1949 is within our species selection exclusively found in Phytophthora species and adopts a ferredoxin-like fold. The N-terminal region of proteins containing this domain shows similarity to another domain (UPF00029) that has been found in the human Impact protein. The P. infestans gene containing both domains is induced early during infection of the plant, providing additional, independent evidence for the possible role of genes encoding this uncharacterized domain in host-pathogen interaction. However, domains that are also abundant in nonpathogenic species (e.g. other stramenopiles) might not be related to or only indirectly involved in pathogenicity. Hence, the exact nature of the contribution of these domains to pathogenesis or to general lifestyle requires more in-depth experimental studies of the candidate domains and genes predicted to contain these functional entities.

Protein domains generally do not act as single entities but in synergy with other domains in the same protein or with other domains in interacting proteins. We identified 773 oomycete-specific bigrams, of which 53 are observed in all analyzed oomycetes (Figure 2-7A; Supplementary Table S2-10). Based on our species selection, we cannot conclude that the oomycete-specific bigrams are common to all oomycetes, since they might only be specific for plant pathogenic oomycetes or even for the selected oomycetes analyzed in this study. The majority of the 773 bigrams, however, are specific for a subset of the tested oomycete species or even a single species. The 320 bigram types that are observed in more than a single species or twice in the same proteome are observed in 982 predicted proteins. These bigrams are less likely to be the result of a wrong gene annotation and include already well-described examples of oomycete-specific domain combinations, such as the FYVE-PIK bigram observed in Phytophthora phosphatidylinositol kinases (Meijer & Govers 2006), the AP2-histone deacetylase bigram that is specifically found in P. ramorum and P. infestans (Iyer et al. 2008), and the myosin head domain-FYVE bigram as well as the FYVE-GAF bigram found in myosin proteins in all analyzed oomycetes (Richards & Cavalier-Smith 2005). Still, some of 49 Domain Analysis in Oomycete Plant Pathogens

the bigrams could be artificial due to false negatives or false positives in the domain predictions. The remaining, species-specific bigrams could be the result of artificial fusion of genes due to wrong gene annotation or an actual biological signal in one of the analyzed oomycete species. The derived results are not only dependent on the quality of the genome sequences of the analyzed oomycetes but also on that of the other eukaryotes. Wrong predictions of bigrams in these species would lead to false negatives in oomycetes. Hence, the number of derived oomycete-specific bigrams is only an approximation, and the true set of oomycete-specific bigrams needs to be further analyzed. Recent analyses of the underlying molecular mechanisms of domain gain in animals have shown that in fact gene fusion, tightly linked with gene duplication, is the major mechanism that shaped novel protein architecture (Buljan et al. 2010; Marsh & Teichmann 2010). The contributions of this mechanism in forming lineage- or even species-specific bigrams in oomycetes and the probable role of the flexible genomes have to be further analyzed. The bigrams presented here form a comprehensive starting point for an in-depth bioinformatic and experimental analysis of promising gene families coding novel domain combinations.

Common domain types form the majority of the observed oomycete-specific bigrams, emphasizing the importance of novel combinations rather than novel domain types as a source for species-specific functionality. Only a minority of proteins containing oomycete-specific bigrams are secreted, and none of these proteins is predicted to contain a RXLR or Crn motif. We are aware that the total number of predicted proteins containing the RXLR or Crn motif is lower than reported in other studies where those were predicted using multiple complementary methods (Haas et al. 2009). However, when directly comparing the number of proteins predicted to contain the RXLR motif by HMMER alone, the reported numbers are similar to our predictions. Together with the observation that RXLR proteins do not contain know