Analysis Of Glycosylation And Glycosyltransferases In Bacteria And Archaea Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Glycosylation process in prokaryotes has been investigated in detail in past but many questions still remain unanswered. Enzyme glycosyltransferase, which mediates glycosylation, not only has its preference for the selection of glycosylation site(s) in its target proteins, but also the type of glycosylation i.e. N-linked and O-linked glycosylation. There is a strong evidence for the presence of a conserved glycosylation operon known as pgl in Campylobacter jejuni and many other bacteria. Proteins encoded by the pgl locus are capable of carrying out functions, ranging from the synthesis of structural components, i.e. carbohydrates moieties, to the functional molecules, i.e. enzymes which are involved in the cascade of glycosylation. In this study we carried out the Bioinformatics analysis of one of the key enzyme PglB of pgl locus which is distributed widely in Bacteria and Archaea. In addition we also looked for the answer of question as to why not all the sequins Asp-X-Ser/Thr have an equal opportunity to be glycosylated by looking at the influence of the neighboring amino acids.

Glycosylation is the process of addition of a carbohydrate moiety to a protein molecule. Addition of a carbohydrate moiety to the side chain of a residue in a protein chain affects the physicochemical properties of that protein. Glycosylation process alters the different properties like proteolytic resistance, protein solubility, stability, local structure and immunogenicity [1]. Glycoproteins are the key molecules involved in the innate and adaptive immune responses [2]

Two important types of glycosylation are O-linked and N- linked Glycosylation. N-linked glycosylation is a co-translational process involving the transfer of a precursor oligosaccharide to asparagine residues in a protein chain. Asparagine usually occurs in a sequon Asn-X-Ser/Thr, where X is any amino acid other than Proline [3]. This is however, not a specific consensus, since not all such sequins are modified in the cell. O-linked glycosylation involves the post-translational transfer of an oligosaccharide to a serine or threonine residue [4]. In this case, there is no well defined motif for the acceptor site [5] other than the near vicinity of proline and valine residues.

A large number of proteins contain the N-X-S/T motif and thus can be potential glycoproteins [6]. A protein can have a number of these tri-peptide motifs but only a few of them or selected ones get glycosylated and it is suggested that it is controlled by the effect of neighboring amino acids. Glycosylation was earlier thought to be restricted only to Eukaryotes but now Both O and N-linked glycosylation pathways have been studied in detail in Camplylobacter jejuni and are very similar to the corresponding glycosylation pathways in eukaryotes [7].

This pathway in C. jejuni is encoded by the pgl gene cluster. One protein from this cluster PglB is considered to be the oligosaccharyl transferase due to its homology with the Sttp3 protein which is a subunit of yeast oligosaccharyl transferase complex N-linked protein glycosylation is a very common post-translational modification in eukaryotes [8]. PglB and Sttp3 have a conserved signature of WWDYG which has been shown to be essential for activity in vivo. PglB has substrate flexibility and can accept multiple peptide substrates [9].

Glycosylation in Bacteria and Archaea

1.1.1 Glycosylation in Bacteria

Through advances in analytical methods and genome sequencing, there have been increasing reports of both O-linked and N- linked protein glycosylation pathways in bacteria, particularly amongst the mucosal-associated pathogens. Studying glycosylation in relatively less-complicated bacterial systems provides the opportunity to exploit glycoprotein biosynthetic pathways. For example, C. jejuni has been established as an excellent model for an N-linked glycosylation pathway in bacteria, with the activities of the characterized pgl (protein glycosylation) gene cluster assembling and transferring a known heptasaccharide from a membrane-anchored undecaprenylpyrophosphate-linked donor to an asparagines residue in proteins at the classic Asn-X-Ser/Thr motif [1].

The central enzyme of C. jejuni pgl operon is PglB. PglB is homologous to a protein from Methanobacterium that also shares significant homology to an oligosaccharide transferase involved in protein glycosylation in yeast. It transfers the heptasaccharide GalNAc-a1,4-GalNAc-a1,4-(Glc-b1,3)-GalNAc-a1,4-GalNAc-a1,4-GalNAc-a1,3-Bac, where Bac is 2,4-diacetamido- 2,4,6-trideoxy-D-Glc from an undecaprenylpyrophosphate carrier to specific Asn residues on acceptor proteins. PglB in bacteria forms N-glycosidic bonds between an Asn side chain and the modified GlcNAc, bacillosamine, or other acetamido sugars unlike the GlcNAc specificity of the eukaryotic system [10].

1.1.2 Glycosylation in Archaea

AlgB is a homologue of the bacterial oligosaccharyl transferases (PglB) in Archaea [11]. The understanding of genetic pathways for the assembly and attachment of N-linked glycans in bacterial systems far outweighs the knowledge of comparable processes in Archaea. Recent characterization of a novel trisaccharide [b-ManpNAcA6Thr-(1-4)-b-GlcpNAc3NAcA-(1-3)-b-GlcpNAc] N-linked to asparagine residues in Methanococcus voltae flagellin and S-layer proteins affords new opportunities to investigate N-linked glycosylation pathways in Archaea [12].

2. Approach

The operon of glycosylation was first reported in Campylobacter jejuni. PglB gene in the pgl operon is responsible for glycosylation in C.jejuni, and glycosylation system in this bacterium resembles to that of the eukaryotic organisms [13].


Protein sequence of pglb gene from Campylobacter jejuni (Accession number: Q9S4V7) was retrieved from UniProt ( and used to search for its homologs in Archaeal and Bacterial proteomes.


Basic Local Alignment Search Tool (BLAST) was used for the retrieval of PglB homologs in Bacteria and Archaea [14]. First PglB was searched against the Bacterial resource, using BLOSUM62 as the scoring matrix and at an expect threshold of 0.1 with search optimized to report only the best 100 hits. Sixty three (63) hits were reported which were further shortlisted based on their function i.e. glycosylation, percentage identity to the input pglB protein and the presence of motif WWDYG. Twenty proteins (20) were thus shortlisted for further analysis.

Second, PglB was searched against the Archaeal database at UniProt using BLOSUM62 scoring matrix and at an expect threshold of 10, thus to accommodate more results, and search optimized to report only the best 100 hits. Thirty seven (37) hits were reported and sixteen (16) were short listed based on the above mentioned parameters.


Selected sequences were given as input in a single file to generate the Multiple Sequence alignment (MSA). MUSCLE which is an iterative and fast program with good accuracy was used to generate the alignment [15]. It produces biologically meaningful multiple sequence alignments of divergent sequences, and calculates the best match for the selected sequences, and aligns them so that the identities, similarities and differences can be seen.

Alignment file generated by MUSCLE was given as input to the BioNJ [16] program to compute a distance tree to represent the phylogenetic relationships between Bacterial and Archaeal enzymes. BioNJ was ran at 1000 bootstraps and tree was visualized using TreeDyn [17] (Fig. 1, Fig. 2 and Fig. 3). Trees produced show two clusters one each for Bacteria and Archaea and clear separation between the bacterial and Archaeal enzymes can be seen. Bacterial enzymes cluster among themselves and Archaeal among themselves. Bootstrap values can be seen at every branch. Mr. Bayes was also used to further confirm the results (Fig. 4).

Relationships among the Bacterial and Archaeal enzymes separately can be seen in Fig. 5 and Fig. 6. Bacterial phylogram shows close clustering of bacterial glycosyltransferases which indicates high level of sequence homology among the bacterial enzymes whereas the Archaeal tress shows conservation of function and relatively less closer clustering of enzymes as and four sub-clusters can be easily seen.

3. Material and Method

3.1 Protein Models Tested for Glycosylation Site Prediction

Auto transporter and GFP proteins were retrieved from NCBI ( Auto transporter proteins were classified into five classes on the basis of their functional domains. To accomplish this task, two tools were used: Pfam [18] and Conserved Domains Database (CDD) [19].

This classification of Autotransporter proteins was used for further analysis of glycosylation sites among these five classes.

3.2 Prediction of Glycosylation Sites

For N-linked glycosylation and O-linked glycosylation a signal peptide is needed in the protein that is to be glycosylated. In the absence of this signal peptide protein will not be glycosylated in vivo. We used two online glycosylation site prediction servers i.e. NetOGlyc 3.1 [5] and NetNGlyc1.0. NetOGlyc 3.1 is for the prediction of O-linked glycosylation sites in proteins whereas NetNGlyc 1.0 for N-linked glycosylation.

3.2 Prediction of O-Glycosylation Sites in Autotransporters

NetOGlyc 3.1 Server was used for predicting the O- glycosylation sites in protein sequences, available at the Center of Biological Sequence Analysis (CBS) prediction server. It predicts the positions of all serine and threonine residues in protein sequences, and provides information regarding the site which has potential to be glycosylated. It generates the graphical view of the query sequence, which helps to visualize the potential glycosylation sites in a given sequence.

3.2 Prediction of N-Glycosylation sites in Autotransporters

NetNGlyc 1.0 server predicts N-Glycosylation sites in proteins using artificial neural networks that examine the sequence context of Asn-Xaa-Ser/Thr sequins. It is accessible at the CBS prediction server. It was also used for the prokaryotic proteins, due to the reason that N-glycosylation machinery in prokaryotes resembles to that of eukaryotes [13]. It predicts N-glycosylation site on all Asn-Xaa-Ser/Thr sequins. The graphical output shows only the position which has potential for N-glycosylation (Table 1)

3.3 Development of Database

The Database of PglB protein sequences was developed in MySQL ( MySQL is currently the most popular open source database server in existence. On top of that, it is very commonly used in conjunction with PHP scripts to create powerful and dynamic server-side applications

3.4 Software Development

A tool for finding specific motifs in glycosyltransferases was developed, using PHP as scripting language in Macromedia Dream weaver. PHP is used for text process and displaying result. This tool is named as OTPT version 1.0. It checks for the presence of specific motifs in query sequences (Fig. 7).


Protein sequences of PglB from C.jejuni and its Archaeal homologs were stored in a database. This database carries information regarding the accession numbers, length of proteins, their names and specific motif essential for N-linked glycosylation. All the protein sequences contain common motifs with few exceptions in Archaeal proteins where there was a single amino acid change from Y→N, Y→F, Y→Q and Y→W. We have designed software for detection of such motifs.

4.1 Phylogenetic Analysis of PglB Sequences

Sequences of PglB and its homologues retrieved from UniProt and were subjected to Multiple Sequence Alignment (MSA). We used MUSCLE for this purpose. Results of MUSCLE show that there are conserved motifs among all the PglB homologs in Archaeal species.

Twenty (20) Bacterial enzymes and sixteen (16) Archaeal enzymes were used for phylogenetic studies. Bacterial tree shows close grouping of Bacterial enzymes (Fig. 5) and Archaeal tree shows clustering of Archaeal enzymes (Fig. 6).

All thirty six (36) sequences were then used to make the final phylogenetic tree. BioNJ was the choice of algorithm to draw the phylogenetic tree and was run at 1000 bootstraps. Trees produced were then visualized using TreeDyn and represented in three formats namely, circular, phylogram and radial (Fig. 1, Fig. 2 and Fig. 3). Phylogenetic trees show two clusters one each for Bacteria and Archaea.

4.2 Retrieval of Autotransporter Proteins

Approximately 700 autotransporter protein sequences were collected and used to check for the presence of glycosylation sites in these proteins. In order to perform this search these proteins were classified into five main classes on the basis functional domains found in them. Pfam and CDD classify these proteins into five classes, named as Adhesins, Lipolytic, Proteases, Toxins, and SPATES. In total three hundred and fifty six (356) autotransporter proteins were classified, out of which 298 were Adhesins, 30 IgA1 proteases, 12 Lipolytic, 16 SPATES and 4 Toxins. AIDA-I precursor, outer membrane autotransporter, lipase/esterase; EstA are some examples of these Autotransporters.

4.3 Glycosylation Site Prediction Results

O-linked Glycosylation sites in autotransporter proteins/ GFPs were predicted by NetOGlyc. O-glycosylation occurs more frequently in autotransporters. The reason is that, Autotransporter Proteins are membrane proteins and they are present on the surface.

N-linked Glycosylation sites in autotransporter proteins were predicted using NetNGlyc 1.0 Server. The results of NetNGlyc show that there is a low level of N- glycosylation in Autotransporter Proteins, because, mostly it lacks the signal peptide for N-glycosylation. Results of NetOGlyc and NetNGlyc are given in Table 1. Graph shows occurrence of both type glycosylation in adhesions (Fig. 8).

After identification, these potential glycosylation sites from the Archaeal and Bacterial glycosyltransferases with their flanking sequences were aligned using MUSCLE. Purpose of the alignment was to probe the amino acids around the consensus sequence Asp-X-Ser/Thr among the in silico predicted glycosylated proteins. The generalized pattern was not observed from aligned sequences (Fig. 9)

4.4 Tool Development

Software for the identification of conserved motifs in glycosyltransferases responsible for N-linked glycosylation was developed. We tested total 90 proteins of Archaea and Bacteria to find motifs in protein sequences. Results were saved into database of PglB. A motif WWDYG in one of the query protein is shown in Fig. 7.

5. Discussion

Bioinformatics enables one to gather, interpret and represent large volumes of data in an efficient and faster manner. Using different computational tools one can greatly accelerate the research process. For example, if true homologs exist one can draw their evolutionary relationships and propose their structures and functions. Expectation from this technology is to understand the mechanism of bacterial pathogenecity. Post translational modification is one important factor in deciding bacterial pathogenecity toward the host such as humans [20].

First objective of this study was to analyze the evolutionary relationships among oligosaccharyl transferases from Archaeal and Bacterial Proteins. Approximately one hundred (100) sequences of PglB and its homologs were retrieved from UniProt for analysis and 36 were shortlisted for phylogenetic studies. Our results suggest that low level of sequence homology among Archaeal AlgB homologs show that there is divergence in protein sequences but they share common functionality. Alignment of bacterial sequences shows high level of sequence homology. Phylogenetic tree shows clear separation of protein homologs from two domains as expected (Fig. 1)

We predicted potential glycosylation sites among autotransporter proteins of Bacteria by using our database of 700 autotransporter proteins initially. Further short listing was carried out in order to remove any redundancy and we were left with 356 autotransporter proteins for glycosylation site prediction study.

Among the Auotransporters, members of Adhesin subfamily are more frequently glycosylated compared to members of other subfamilies in the group. From the results of glycosylation site prediction we inferred a contradictory result that many of the autotransporter proteins without conventional signal peptide are also glycosylated (Fig. 8)

According to the results shown by NetOGlyc it is concluded that: In autotransporter proteins occurrence of O-linked glycosylation is observed relatively more in adhesion (Fig. 8) and Lipolytic (Fig. 10) class of autotransporter proteins. Adhesins showed trend of getting glycosylation more frequently, the reason perhaps is that, we tested more of these members and data is bigger for Adhesins. They are surface exposed and this helps them to get glycosylated by glycosyltransferases.

Outcome of NetNGlyc N-lnked glycan prediction tool suggested that the overall frequency of occurrence of N-linked glycosylation is more in autotransporter proteins compared to O-linked glycosylation. Members of the Adhesin subfamily seem to be leading members to be N-linked glycosylated. This may be because the conserved motif for N-Linked glycosylation occurs more frequently in Adhesins than in any other class of Autotransporter Proteins (Table 1)

After predicting the potential glycosylation site in Autotransporters, the amino acid sequences of these sites from the original 36 input proteins were aligned to find a conserved amino acid pattern for these predicted sites. The results we found after the alignment are: (1) No significant conserved pattern found in predicted glycosylation site of autotransporter proteins. (2) The Flanking sequence does not show any major sequence similarity in these proteins (Fig. 9)

A Database of homologs of PglB was developed which has 90 enzyme sequences. It contains information about the source of protein, its length and a sequence of five amino acids important for N-linked glycosyltransferases; it is conserved in most of the sequences. The databases of autotransporter proteins carry information related to protein sequences and occurrence of glycosylation sites in these proteins

As mentioned before, there is a specific five amino acid sequence i.e. WWDYG in Archaea and Bacterial oligosaccharyltransferases important for the proper function of these proteins. A motif search tool is developed by using computer programming language PHP. Through this tool we can check the existence of this particular amino acid motif in the given protein sequence.

6. Conclusion

In this study, we compared and analyzed the oligosaccharyl transferases from different species of Bacteria and Archaea. We observed that proteins from several species share a conserved motif essential for N-linked Glycosylation, especially some of the Bacterial species. Software was developed for efficient searching of these motifs in a query sequence. These protein sequences of PglB and its homologs were stored in a database i.e., "Database of PglBs". Prediction of Glycosylation sites in Autotransporter proteins and Green Fluorescent Proteins was done using online tools. It was observed that Adhesins shows significant glycosylation frequency than other classes of autotransporter proteins. While the patterns for the predicted glycosylation sites are not conserved in both GFPs and Autotransporter proteins. A detailed verification is required for this analysis, to derive the new information or to plan new direction for the further study.