Identification Of Homologous Of Known Food Allergens Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The cause of immune response in humans is protein which is a food allergen. The objective of this report is to determine the usefulness of bioinformatics in identification of homologues of known food allergens, such as major peanut allergen Ara h1. The allergen data was identified from published journals and from Medline and Embase databases. Those databases were combined with text word and MeSH heading strategy. The report focussed on various, food allergen databases and bioinformatics tools which are useful in the identification of potential food allergens. The use of bioinformatics tools is to compare the protein structures which are becoming more important because more structural data is available in each day. The bioinformatics are used to investigate the structural and functional relation between known food allergens and those relationships can be used to identify novel food allergens. Bioinformatics does not 100% accurate on finding novel allergens. However, with the use of bioinformatics, cross-reaction between proteins analyzed and immunotherapy could be developed.

The allergens are proteins which are resistant to heat during cooking, acid in stomach and digestive enzymes. The allergen proteins can enter the bloodstream and causes an immune response. Food allergy is a hypersensitivity state. Contact with a food allergen activates the plasma cell (lymphocytes) which produces the allergen specific immunoglobin E (IgE) antibody. IgE molecules then attach to the surface of the mast cells. Mast cells are specialised which contains histamine and heparine. The binding of IgE to mast cells causes an immunological memory and when person is exposed to the same allergen second time, the IgE antibodies presented on the surface of mast cells become activated releasing histamine which is responsible for the symptoms related to the food allergy (Kindt, T.J. et al, 2007). The symptoms of allergy occurred due to exposure with food allergens are: itching, swelling of lips, coughing, runny nose also the allergens can cause asthma and can result in anaphylaxis which is a sudden drop in the blood pressure (Bupa, 2008).

Food allergy is one of the major increasing health problems both in children and in adult. Some of the food allergens are derived from; peanuts, wheat, milk, eggs and soy. These allergens affect "8% of infants and young children" (Samson,H., 2005). According to the studies of Jenkins et al. (2005); Ferreira et al. (2004) and Mari, A. (2001), common epitopes of different sources may result in clinical food allergies. For example; Ara h1 is a major peanut allergen protein where its IgE reactive epitopes from its peptide sequence was identified by Burks et al. in 1997. Subsequently, comparing the IgE epitopes of Ara h1 helped to identify similar allergen epitopes in other foods like tree nuts and legumes (Lopez-Torrejen, 2003). Identification of similar allergenic epitopes by comparing the possible allegens with known food allergens is very important because the allergens have conserved sequences within the epitopes. For this reason, after the identification of reactive epitopes, bioinformatics tools can be used to identify other related proteins associated with similar reactive allergy and to define more specific treatments. For example, in the study Bolhaar (2004) the patients with allergic syndrome to apples are also sensitive to birch pollen. This is due to the similarity between Bet v1 allergen and the Mal d1 protein from apple. The specific immunotherapy containing Bet v1 extracts is able to decrease the sensitivity of apple protein Mal d1.

The aim of this report is to identify the allergenic food homologues of allergenic proteins and comparing the protein sequences, epitopes and structure of known allergens by using bioinformatics tools.


The relevant food allergy studies and food allergen databases identified from Medline and Embase databases. Medline and Embase are bibliographic databases which contains over 16 million and 20 million journal articles respectively. The database was combined with text word and MeSH heading search strategy. ( The text words were "allergens", "food allergens" and "food allergens databases". Also, the search was supplemented by examining specific review articles and bibliographies. Further articles identified by using retrieved articles.

Table 1 shows the flow chart of the search strategy. Using this strategy, a total number of 34226 hits were obtained from "allergens" search criteria. By searching "Food allergens" 6085 hits were obtained when "Allergen database" was searched 196 hits were obtained. "Food allergen database" search criteria produced 66 hits. The results were searched again by excluding non-bioinformatics materials which was ended up with 14 relevant databases and 22 relevant articles.

Table: 1 Flow chart of search strategy.

Allergens: Total hits 34226

Food allergens: Total hits 6085

Allergens databases: Total hits 196

Food allergens databases: Hits 66

Databases with relevant outcome: 14

Articles with relevant data: 22


Allergen Databases

There were14 databases identified which contains sequences and information about the allergenic proteins. Table 2 shows these databases.

Table: 2

Website name

Web-site link

Information available

All Allergy

Genbank accession numbers of allergens


Genbank accession numbers of allergens


Names of allergens, and links to PubMed & sequence databases

CSL index.htm

Names of food allergens with links to Genbank


Names of food allergens with links to Genbank


Biochemical, structural, and clinical data


Food allergens, epitopes,

sequences, links to literature


Allergens, sequence links to Genbank, and a FASTA search


Allergen sequences, WHO allergenicity rules using FASTA


Allergens sequences, protein type, IgE epitopes collection, tools for sequence and epitope comparison



Allergens, a BLAST search, and implements the WHO allergenicity rules


Potential allergenicity of proteins using motifs found by a wavelet algorithm


Predicts allergenicity with MEME/MAST motifs


WHO allergenicity rules using FASTA

Table 2 shows the name and link of allergen databases also, bioinformatics tools to determine allergenicity.

These databases give indication of allergenic protein. For example International Union of Immunological Societies (IUIS) contains the names, genbank accession number and information of allergens.

Most of these databases do not have cross-indexing therefore it is difficult to determine the relationship between proteins. Brusic et al. (2003) also reported the allergen databases. However some databases provide information about cross indexing which is beneficial in the identification of cross reacting allergens. Example database for cross indexing could be Allergome. It contains allergenic, clinical, biological and structural data. The Allergome database has no bioinformatics tool but contains allergen MEME "sequence motifs" which are strongly related with allergen. Swiss Prot, PIR, and Genbank contain protein sequences where CSL and Biotechnology Information for Food Safety databases use those sequences to provide list of allergens.

Some databases provides direct comparison of allergen sequences by bioinformatics tools and permit the use of WHO guidelines for predicting allerginity (WHO 2001).On the other hand some databases such as FARRP and ADFS enable to search lists of allergens, protein sequences and a FASTA search for related sequences. ALLERDB database contains allergen sequences and BLAST search can identify the sequence similarities.

SDAP Database

SDAP (Structural Database of Allergenic Proteins) ( ) is one of the biggest food allergen databases. The difference of SDAP database from other databases is, it uses bioinformatics tools to compare allergenic proteins. Some bioinformatics tools enable to compare protein sequences to identify allergens due to similar IgE epitopes of the known allergen and proteins (Schein, 2005). By using SDAP search in allergenic proteins, novel allergenic proteins could be identified. The search is rapid and it depends on the sequence similarities, 3D structure and known allergenic epitopes.

SDAP database compare the allergen sequences in different methods. The in-house bioinformatics method provides identification of the sequence similarities and links to other large databases (Swiss Prot and Gen bank). By using BLAST and FASTA protein search tools, resemblances of the amino acid sequences of the allergens could was determined. Also, Pfam grouping is available for the allergens in SDAP which identifies the protein similarities. Bioinformatics tools identify the related IgE epitopes, in this way the user map the IgE containing peptides onto 3D models of allergens.

SDAP also contains IgE binding epitopes of the allergenic proteins. The epitope sequences identified by in vivo binding experiments which is binding of the short peptide sequences to solid phase. The bound peptides assumed to be epitopes (Li,2003). SDAP database includes information about IgE epitopes of some allergens such as; peanut (Ara h1, Ara h 2 , Ara h 3), hen egg (Gal d 1), buckwheat (Fag e 1), English walnut (Jug r 1), soybean (Glym glycinin G1 and G2), shrimp (pen a 1, pen i 1) etc.

Comparing Allergenic Protein Sequences Using FASTA Tool

The similarity between protein sequences and other allergen sequences might cause cross-reacting. This similarity can be identified by the use of the bioinformatics tool FASTA (Pearson, 1990). A FASTA run for sequences result in an output which shows the similar allergens in SDAP including their "E-value". Table 3 shows the FASTA search in SDAP database Jun a 3 allergenic protein sequence was used. E-value is the statistical significance of the match hit and it shows how many matches expected to occur randomly using the same sequence in a database of a given size. Lower the E-value, higher the match. According to the table 3, Cup a 3 is the most similar allergen to Jun a 3. Cup a 3 is cypress tree allergen. The table 3 also shows several vegetable and fruit allergens. Based on the FASTA alignment, person with a cedar pollen allergy may develop allergenic symptoms due to consumption of apples or cherries (Breiteneder and Millis, 2005).

Table :3 FASTA search using the Jun a 3 protein





Sequence Length

bit score

E score


Jun a 3


cedar pollen





Cup a 3







Cap a 1w


bell pepper





Lyc e NP24







Cap a 1


bll pepper





Mal d 2







Pru av 2







Act c 2






Table 3 shows FASTA output in SDAP. Jun a 3 used as an allergen. The results show Jun a 3 proteins. They are aligned according to low "E-scores".

Pfam Families

Pfam is a protein database ( Protein families are represented according to multiple sequence alignments and Hidden Markov Models (HMMs). The protein function is identified by active domains and the interaction of domains. Active domains are in the protein that means by identifying the active domains the functions of the protein can be understood. The Pfam database is held at: Wellcome Trust Sanger in UK, Howard Hughes Janelia Farm Research Campus in USA and Stockholm Bioinformatics Centre in Sweden. Two entries of Pfam are; Pfam-A and Pfam-B. Pfam-A contains protein families and Pfam-B contains additional database which can be used in the identification of conserved regions when no match observed from Pfam-A.

Assorting different types of allergens into Pfam group is important because in this way the identity of similar proteins with different names can be determined. Also, Pfam data specifies the functional similarities of the proteins (Schein et al., 2006). Entries into SDAP database are assorted into suitable Pfam group and the similar allergens to other proteins identified in the Pfam assortment.

Table 4 shows 18 common allergenic Pfam families. The widest allergen family is PF00234 which is protease inhibitor/ seed storage/ LTP family containing 34 allergens. Novel allergens might be introduced to an existing Pfam family or they might be introduced into a new Pfam family according to their multiple sequence alignment and HMM profile.

Family name

Pfam code

Number of Allergens

LTP family/Protease inhibitor/seed storage






EF hand



Pollen allergen



SCP-like extracellular protein



Bet v I family (Pathogenesis-related protein )









Lipocalin /cytosolic fatty-acid binding protein family



Rare lipoprotein A (RlpA)-like double-psi beta-barrel






Pectate lyase



Papain family cysteine protease



60s Acidic ribosomal protein



Subtilase family



Thaumatin family



Pollen proteins Ole e I family



Ribonuclease (pollen allergen)



Table 4 shows Pfam-A allergen families from SDAP.


The cross-reactive proteins have similar sequence and structure but they differ in their source. Pfam group has a limitation because it contains both allergenic and non-allergenic proteins together thus it is difficult to select allergenic proteins from the others. The cross-reactive allergens in the WHO guidance shows insufficient sequence identity (35%) (WHO guidelines). On the other hand, the IgE binding properties reduced due to protein mutations (de Leon et al., 2003). For example Bet v 1 protein which has 98% sequence identity do not cross-react (Hartl et al., 1999).

Some bioinformatics tools recognise whether the protein is allergenic or not due to allergenic motifs. For example Web Allergen determines the poteintial allergenic motifs (Riaz et al., 2005). The IgE binding sites of the potential allergenic motif must be proved to verify the allergen. AlgPred ( gets help of BLAST search and support vector machines (SVM) to identify the allergenic epitope motifs. Also, it permits MEME/MAST allergenic motifs search. The MAST has ability to locate the IgE binding sites on the protein. AlgPred gives opportunity to determine allergens using combination of SVM, BLAST, MAST and IgE epitope.

According to the Schein et al. (2006) there are some problems in distinguishing whether the tropomyosins are allergenic or non-allergenic because of their similar sequences. Some reports have evidenced the protein allergenicity by using bioinformatics and experimental procedures.

Motifs are sequences where the sequences are conserved in related proteins. IgE binding is characterized by conserved sequence motif in allergenic proteins (Brusic and Petrovsky, 2003).


The food allergens cause an immune response in humans. Some isoforms of the food allergens where point mutation occurred does not show allergenic properties. The use of bioinformatics tools is to compare the protein structures which are becoming more important because more structural data is available in each day. The report focused on some of the search tools used in bioinformatics which can be used to investigate the structural and functional relationship between known allergens. Those relationships can be used to identify potential novel allergens. Bioinformatics does not 100% accurate on finding novel allergens. However, with the use of bioinformatics, cross-reaction between proteins analyzed and immunotherapy could be developed (Shein et al., 2006).