Bioinformatics analysis of protein with implication in involvement in psoriasis

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


The structure of the skin consists of three layers: the epidermis; also known as the epithelium; which is the uppermost layer which forms the surface of most structures in the body, dermis; the fibrous connective tissue of the cell and subcutis; the fat layer below the dermis and epidermis. The epidermis contains three main cell types namely keratinocytes (skin cells), langerhans cells (immune cells) and melanocytes (pigment-producing cells). There is also a fourth cell which is present called the merkel cell, but it is much less visible. The keratinocytes form four distinct layers: Stratum corneum (horny layer), stratum granulosum (granular layer), stratum spinulosum (spinous or prickle cell layer) and stratum basale (basal layer). As the keratinocytes move outwards they get more differentiated and accumulate keratin and eventually fall off.

Psoriasis vulgaris is a chronic skin, non-infectious autoimmune disease that causes scaling and inflammation of the skin. The most common form results in red, thick patches of skin covered in silvery scales, which are typically found on the scalp, lower back, soles of the feet, palms, knees, and elbows. However, they can be found almost anywhere on the body. The patches, also known as plaques, are often painful and itchy, however the manor and symptoms by which the disease manifests varies from individual to individual. This means that in some cases the symptoms are so mild that the disease may go unnoticed, while in others psoriasis can be life threatening.

The disease itself is multifactorial meaning it has genetic and environmental components. Epidemiological studies which include genetic studies in twin pairs and siblings, implicate genetic factors in the pathogenesis of psoriasis (Elder et al., 1994). Results observed form the general population states that If an individual suffers from psoriasis, then the risk of their sibling getting the disease is four-fold (Camp et al., 1992). Sporadically, many generation of psoriasis on blood relatives are observed. Psoriasis isn't curable but it can be treated. It is a common disease found in all racial groups and it affects 2% of the UK population (Camp et al., 1992), as well as other population such as in Sweden more than 2% of the population is affected by psoriasis (Hellgren et al., 1967).

Psoriasis is a hereditary skin disease which is caused by the white bloods cells more commonly known as the T cells in the immune system. The presence of these T cells in the immune system is to help protect our body against an infection or a disease. In people who are suffering from psoriasis the T cells become over active, triggering other immune responses which causes the swelling and rapid turnover of skin cells (keratinocytes). There are numerous forms of psoriasis namely: plaque psoriasis; the most common form of psoriasis which affects 80-90% of the psoriatic population, guttate psoriasis; which often follows after streptococcal infection such as streptococcal pharyngitis, psoriatic arthritis; which causes the inflammation of the joints and connective tissue which affect 10-15% of the people who already have psoriasis, pustular psoriasis; present as raised bumps which are filled with non-infectious pus also known as pustules and erythrodermic psoriasis; results in a very unstable plaque psoriasis, which usually follows a hasty withdrawal of systemic treatment.

Psoriasis susceptibility 1 candidate gene 2 protein (PSORS1C2 / SPR1) also known as C6orf17 (chromosome 6, open reading frame 17) is a protein which is expressed in skin, heart and skeletal muscle. It is 136 amino acids long and it is is located on the chromosome 6p21.3 locus. SPR1 has no isoforms meaning that this protein does not exist in another form.

Richard et al., and Trembath et al., shows cogent evidence for the manifestation of a psoriasis susceptibility locus inside the MHC region on chromosome 6p21 (Richard et al., 1997 and Trembath et al., 1997). This has been known for a long time that the MHC locus on 6p21.3 consists of the gene(s) that activates the outcome of psoriasis. This conclusion has come about due to lineage and association studies done in psoriasis individuals and families (Elder et al., 1994). In diverse ethnic groups, the HLA-C allele Cw*0602 has persistently been found to progress in psoriasis populations.

SPR1 is a small proline rich protein which is conserved in a diverse range of species ranging from humans to the Duckbill platypus. It has 2 natural variations and no related structures and no identified domains. The structure and function of this protein is not known. However, it is believed to be implicated in psoriasis due to its presence in the main PSORS1 region, next to the HLA-C region; which is the major genetic determinant for psoriasis and is heavily expressed during the disease process, contributing to this chronic skin condition.

There are two other proteins which causes a psoriatic type condition but could also be linked with psoriasis. These are the proline rich Zinc finger protein 828 (C2H2 zinc finger domain). and Trans-sialidase protein. The Trans-sialidase protein which is present in the insect vector Trypanosoma cruzi is involved in causing Chagas' disease.

SPR1 is excessively expressed in the squamous metaplasia of bronchial epithelium (Lau et al., 2000) and it has been shown that there is a correlation of multistep bronchial carcinogenesis and of transcriptional dysregulation of this gene (Lau et al., 2000). However, the structure and function of this protein is unknown, but SPR1 has been implicated in psoriasis because it is in the region of the PSORS1 locus, next to the HLA-C region, which is believed to contribute to psoriasis (Nair et al., 2006). There is an over expression of this SPR1 protein in the primate airway epithelium (An et al., 1992). A tumor promoter called phorbol ester up-regulates this protein as well as it can be down-regulated by vitamin A (An et al., 1993). Howbeit, the expression of the SPR1 protein is distinctly attenuated or lost in lung cancer (Hu et al., 1998 & DeMuth et al., 1998). A high rate of lung cancer has been observed in patients with psoriasis (Ishioka et al., 2000).

To begin to investigate the SPR1 protein to determine its presence in psoriasis, numerous bioinformatics tools were used in order to predict the structure and function of the protein. These works have demonstrated that the small proline rich protein has no structure, making it hard to determine its role in psoriasis.

In this study, the SPR1 gene was further analyzed in order to detect its role in psoriasis via numerous bioinformatics tools.


Bioinformatics is the application of computer science, mathematics and physics to biological and chemical problems. Bioinformatics is used to obtain biologically oriented data which are organized into databases. The unfolding of these computer databases and algorithms is to advance and enhance biological research. The starting point for the project was a UniProt accession number: Q9UIG4. Using this identifier general information about the protein was found. A variety of tools were then used to find links or information that may implicate the SPR1 gene in psoriasis.

Databases - DNA sequence databases


UniProt[ref] which can be found at "" is a universal protein resource which is updated every three weeks. This is tool which is freely available for anyone to use and it shows a completely analyzed and a meticulously defined protein sequence knowledgebase (UniProtKB) with comprehensive cross-references and challenging interfaces for the sustenance of biological research (Oxford University Press, 2010). UniProt consists of the TrEMBL, Swiss-Prot and PIR protein databases. A background of all the protein sequences displaying a complete sequence is found in an athenaeum known as the UniProt Archive (UniParc) (Leinonen et al., 2009). This tools was used to find out general information on the protein such as the names and origin of the protein, the protein attributes, ontologies, the protein sequence, references (articled from pubmed to do with this protein) and also a BLAST search could also be performed.


GenBank (Benson et al., 2010) (“”) is provided by the National Center for Biotechnology Information (NCBI) and it contains nucleotide and protein sequences for more than 300,000 organisms. Large scale sequencing projects which are taken from individual laboratories and batch submissions are used to acquire the sequence information for these organisms (Benson et al., 2009). GenBank was used to obtain a summary, of the associated genomic regions, genomic context and PubMed articles.


GeneCards (“”) is a database which is free for the public to use. At present, there are more than 7000 human genes that have an accreted gene symbol presented by the HUGO/ the genome database (GDB) nomenclature committee (Rebhan et al., 1998). It was used to elucidate information including aliases and descriptions, genomic view, the gene location, alternative locations of the SPR1 protein. It also gives a graph which shows the 12 experimental tissue vectors; such as the thymus, brain and many more. Orthalogs from 2 species who have the PSORS1C2 gene and PubMed articles for this gene are also shown.

The International Protein Index

The International protein index (IPI) “(”) is a database for proteomics experiments. IPI gives information on the nucleotide sequences, genomes, protein sequences, small molecules and macromolecular structures of the PSORS1C2 protein and it provides database cross references to UniProt/Swiss-Prot, Vega, EnsEMBL, UniProtKB/TrEMBL, H-InvDB, HGNC, UniParc, CleanEx and Entrez Gene. It also gives the sequence information like UniProt.

It is built from Swiss-Prot, EnsEMBL, TrEMBL and RefSeq which are all automatically annotated databases (Kersey et al., 2004). This database provides information on non-redundant data sets which represent the main proteomes: humans, mouse and rat (Kersey et al., 2004).


EnsEMBL Vega is an automated, freely usable web resource which uses manually annotated data. Example genome sequences include vertebrates; humans, zebrafish and mouse, besides many common model organisms such as Caenorhabditis elegans: the nematode worm, Saccharomyces cerevisiae: yeast and Drosophilla melanogaster: the fruit fly. The EnsEMBL browser “(”) produces gene predictions and annotations which can also be accessed through the Perl API (Application Program Interface), BioMart or MySQL (Spudich et al., 2007). Gene predictions are extracted from public databases such as RefSeq and UniProt, which are based on mRNA and protein evidence. The annotations includes single nucleotide polymorphisms (SNPs), clone sets, insertion-deletion mutations (in-dels), functional clashes namely gene oncology (GO) and protein domains (Spudich et al., 2007). EnsEMBL is currently one of the three most frequently used genome browsers in the world along with UCSU (University of California, Santa Cruz, “”) and NCBI (National Center for Biotechnology Information, “”). This database was used to identify and confirm the position of the SPR1 gene relative to all the other genes on the PSORS1 locus, on chromosome 6.




DisProt (Sickmeier et al., 2007) is a database of protein disorder that dispenses information on the intrinsically disordered proteins (IDPs) that “fail to form a fixed three-dimensional (3D) structure, under physiological conditions”. It is freely available at “”. IDPs carry out critical biological functions, namely signaling, regulation and control. DisProt enables users to carry out a range of biological and computational analysis, including homologous protein sequence retrieval, disordered region sequence download, graphical ordered and disordered region maps, functional narratives, author verified entries, isoform display and an inclusive bibliography for disordered proteins (Sickmeier et al., 2007).

Other tools which were also used along side DisProt, to increase the confidence of disordered structures: DomTHREADER; domain prediction server, SPRITZ; protein disorder prediction server, DisEMBL; an intrinsic protein disorder prediction server, DRIPPRED; order/disorder protein prediction server, DISOPRED; protein disorder prediction server, FoldIndex; prediction tool for protein folding or unfolding, GlobPlot; intrisic protein disorder, domain and globularity predictor, FoldUnfold; protein disorder prediction server, to see of consistent results were achieved.




The PDB is the Protein Data Bank (PDB); it is a collection of all known protein structures primarily solved by x-ray crystallography and nuclear magnetic resonance spectroscopy (NMR). It is a source for studying biological macromolecules, protein function, and evolution via SCOP and CATH. There were no homologues found in PDB therefore the databases for homology modeling; Modeler and WHAT IF were not able to be used. This then led to the further study of the protein by doing a secondary structure analysis. (pt in results)



MUSCLE/ T-Coffee/ ClustalW2

MUSCLE, T-Coffee and ClustalW2 were three algorithms used in order to determine the multiple sequence alignment for the spr1 protein. MUSCLE stands for MUltiple Sequence Comparison by Log-Expectation. It is known to be more accurate and produces results at a higher speed than the other two databases; T-Coffee and ClustalW2. All three databases were used in order to check for consistency and accurate results were derived. The protein sequences inserted were the ones for the SPR1 protein, the Trans-Sialidase protein and the zinc finger protein 828. These tools were used to compare the three sequences in order to identify conserved sequence regions and to observe the similarity between them.



It is important to find out the secondary structure of a protein in order to find out the overall three-dimensional (3D) structure of that protein because if the 3D structure could not be found using the sequence obtained from UniProt by threading, then the secondary structure could be used to find similar protein structures which share the same fold in order to then find the structure. The latest alignment search procedure uses an automatic protein family alignment based on profiles. There are many groups which have produced the profile-based databases searches. The prediction performances have been increased by the creation of PSI-BLAST and PSI-PRED (Altschul et al., 1997 & Karplus et al., 1998). The first person to use the evolutionary information in a PSI-BLAST PSSM in order to predict the secondary structure form the sequence was David Jones (Jones et al., 1992). Groups of people in labs have also proposed their own methods, SAM-T99sec (Karplus et al., 1999), which uses the Hidden Markov Models to obtain the evolutionary diverged profiles. The PSI-BLAST alignments were also used by Cuff and Barton to create JPred2 (Cuff et al., 2000). The prediction of too short segments were difficult, but this problem was solved by the use of SSpro which used multiple alignments and a recursive neural network (Bahar et al., 1997). {KEEP OR NOT}

Event and Alarm handling application (EVA) is an automatic, globally distributed server which analyses the other automatic prediction servers (Baldi et al., 1999) which uses PDB to obtain all the latest experimental structures and this sequences is then sent to all the prediction servers and it then collects the results (Berman et al., 2000). EVA concludes that the other servers namely, PROF, PSI-PRED and SSpro gives the most accurate results and the α-helix and β-sheets secondary structure prediction are more advanced in these tools, hence increasing the accuracy levels. Nevertheless, there are also other small stable structures which exist in the proteins, namely hairpin loops and turns, and the prediction of these structures are not facile and hence research is not sufficient in this area in order to obtain accurate predictions. However, PSI-PRED is shown to give inconsistent results as shown in the results section below.


PSI-PRED is a protein structure prediction server which enables users to perform a protein structure prediction by submitting a protein sequence. These are: PSI-PRED which predicts an accurate secondary structure for globular proteins, MEMSAT 2 which is a new prediction method for membrane proteins based on the commonly used transmembrane topology prediction method, or GenTHREADER which is fold recognition method based on sequence profiling (McGuffin et al., 2000)


YASPIN (Lin et al., 2005) is a Hidden Neural Network (HNN) tool; which is a combination of a hidden Markov model and a neural network which predicts the secondary structure of a sequence using the PSI-BLAST algorithm “to produce a PSSM for the input sequence” (Lin et al., 2005) . The secondary structure consists of four different elements, namely α-helix, β-sheet, loop and turn. The core region of the protein is composed of the α-helices and β-sheets.



PSORT is a freely webserver “” that predicts the sites of protein localization in cells. WoLF PSORT is an updated version of PSORT II for animal/fungi/plant sequences. The protein amino acid sequences are converted into numerical localization features by WoLF PSORT based on the composition of amino acids and the functional motifs such as DNA-binding motifs (Horton et al., 2007).



This freely usable algorithm uses the average residue hydrophobicity and the net charge of the of the query protein sequence to predict if the sequence is intrinsically unfolded. This database is a very reliable database which has an error rate similar to the more advanced fold prediction methods (Prillusky et al., 2005). FoldIndex was used in order to predict if the SPR1 protein is predominantly folded.


ExPASy (Expert Protein Analysis System) Proteomics Server ( is presented by the Swiss Institute of Bioinformatics (SIB) was the first WWW server which begun to work in 1993 in the avocation field of life sciences (Gasteiger et al., 2003). Admittance to an array of databases and analytical tools, devoted to proteins and proteomics is enabled by this SIB database. This ExPASy[ref] database consists of TrEMBL[ref] and SWISS-PROT[ref] and PROSITE[ref], SWISS-2DPAGE[ref], TrEMBL[ref] and SWISS-MODEL repository (Gasteiger et al., 2003).


Threading also be known as fold recognition and it uses the given sequence with an unknown structure to find a known structure with similar sequence. This is done in a four stages. Firstly, all the existing folds are used to thread the target sequence which has an unknown fold. Then a score function is provided which enables all the threaded folds to be compared between each other. This score function is computed via two of the most common techniques namely, the contact potential and sequence profile method and then a search strategy is investigated for this threading and the vital part of the threading being the search algorithm.


GenTHREADER is a protein fold recognition server used for protein sequences. Traditional sequence alignment algorithms are used to produce alignments that are used in the threading process. Finally, each of the threaded model is determined by the neural network in which in the proposed predication, a single measure of confidence is generated. This is a very reliable and fast server for the automatic prediction of structure for all the proteins, especially the ones in the translated bacterial genome (Jones et al., 1999).

GenTHREADER was used to find a suitable template structure for this small proline rich protein. However, no suitable template was retained. Therefore, other programs such as Modeler which is one of the common programs used for homology modeling or Swiss-model which allows the analysis of a number of proteins at one time could not be used, since there were no results from GenTHREADER. The conFunc server and CBS WWW Server were also used and no results were predicted.


Databases - DNA sequence databases


UniProt was used to obtain a sequence for the SPR1 gene which was the used in all the other protein analysis tools and it was also found out that the SPR1 protein has 136 amino acids. This database also revealed that this protein could potentially have a signal peptide, but there are no experimental data to back this up. SPR1 also has two single nucleotide polymorphism (SNPs). One of which causes the amino acid to change on the 25th amino acid from the glycine; a non-polar hydrophobic amino acid, to aspartate; an electrically negatively charged, acidic and medium sized amino acid and the second SNP causes the proline on the 84th amino acid to change to a leucine which are both non-polar hydrophobic, medium sized amino acids. Both the amino acids on the second SNP have a similar physico-chemical property.


The results presented by EnsEMBL are shown below.

This shows the PSORS1 locus which is next to the HLA-C region. The SPR1 protein (PSORS1C2) is highlighted in green.

The PSORS1 locus is located on chromosome 6p21.3 and it is the major susceptibility locus for psoriasis, near the HLA-C region. The main genetic determinant related to psoriasis susceptibility is the major histocompatibility complex (MHC) which was proven by linkage and association studies. The HLA-Cw6 allele within the MHC shows the strongest association with psoriasis (Martinez-Borra et al., 2003). PSORS1 is the psoriasis gene in the MHC. The existence of the SPR1 gene and the other multiple associates genes is due to genetic disequilibrium - which happens to be the usual process which causes specific alleles at two or more loci to be inherited together (Elder, 2006).

Linkage disequilibrium is the process by which alleles associate at two or more loci in a non-random manor, this does not have to be on the same chromosome. Linkage disequilibrium can result from a number of processes namely, epistatic natural selection (which is the most popular process), random drift, mutation, genetic hitchhiking and gene flow.

Patterns of linkage disequilibrium have been distinguished across the genome by the HapMap Project and this reveled that the recombination rate was 2.3 times lower than the genomic average for the major histocompatibility complex class I subregion containing the HLA-C and the ten genes next to it (Walsh et al., 2003). Therefore, it is concluded that due to linkage disequilibrium with each other, all the genes present in this region are involved in psoriasis (Elder, 2006). SPR1 and the neighboring gene on the opposite strand of the chromosome; SEEK1, have mRNA alleles unique to risk, but CDSN and HLA-C have protein alleles which are unique to risk of psoriasis (Elder, 2006). So therefore, due to genetic disequilibrium it can be concluded that SPR1 is involved in psoriasis since it is present in the main disease causing PSORS1 locus within the 300-kb risk interval.



A few tools were used to predict the SPR1 proteins disordered structure as shown below, in order to get accurate results.


GlobPlot and DisEMBL were both used because the two servers compliment each other beacsue the disordered prediction is approached differently. DisEMBL is more accurate than GlobPlot in coils prediction. GlobPlot is an easy to use server, in which all the information is highly visible. It is used to find repeats, domain boundaries and unstructured regions


The results obtained from Disopred shows that this small proline rich protein is primarily disordered. Disordered proteins form a small fraction of the proteins in which they do not form a rigid 3D structure either throughout the entire protein or in large segments of the protein. Disordered proteins do not tend to maintain long-range interactions, making them different form the normal ordered proteins. However, they function mainly as entropic tethers or linkers and are so called linker type disordered proteins and there are also other disordered proteins present which fold in the existence of their protein partners and are so called binding type disordered proteins (Brown et al., 2010). It is also shown that the binding type disordered proteins also known as molecular recognition elements have a proline content which is approximately 50% greater than the value shown for the normal ordered proteins (Brown et al., 2010). This is because the frequency of aromatic residue calculated in the disordered proteins and ordered proteins are similar (Brown et al., 2010). Hence it can be concluded that the SPR1 protein is a binding type disordered protein. In order to form the interface with protein-binding partners, the large content of aromatic residues and the increased content of the non-polar proline is important because it is these residues which forms the interface (Brown et al., 2010).



The synthesis of proteins occurs in a step by step process by the addition of amino acids individually from the NH2 termini of the chains (Canfield & Anfinsen, 1963). The folding of the protein occurs with the information present in the amino acid sequence and its surroundings, this is also true for the unaggregated polypeptides which have no structure which are produced by the denaturation of native molecules (Anfinsen, 1966). Protein folding is the name given to the change of a newly synthesized or denatured polypeptide chain into a particular three dimensional conformation of a protein.

Following the multiple sequence alignment, the PDB server was used in order to look for homologues. This database came up with no results and hence a secondary structure prediction was carried out.

GenTHREADER was used to identify proteins of known structure with similar sequences. No proteins with known structure has the same sequence similarity with their sequences. As a result, no results were obtained indicating that this SPR1 protein has no protein structure and hence further analysis could not be done into determining its structure. The percentage of proline in this sequence is very high suggesting that it may form poly-proline helix associated structure, speculating that this SPR1 protein could form a poly-proline type II helix.

Proline is a unique, naturally occurring imino acid (imino acids are related to amino acids. An amino acid which consists of a secondary amine group is known as an imino acid) which has a side chain, cyclized to the backbone, significantly limiting its conformational shape. There are two types of poly-prolines depending on the isomerization state on the prolyl bond, namely poly-proline type I helix (PPI) where all the peptide bonds are in the confirmation and poly-proline type II helix (PPII) where all the peptide bonds are in the trans confirmation (isomers). Poly-prolines are very stable, stiff and are famous for adopting a poly-proline type II helical structure which is a bent worm-like chain with a constant length, which is similar to the structure of the fibrous protein collagen which is primarily composed of proline, hydroxyproline and glycine (Doose et al., 2007). This PPII helix consists of an open structure with no internal hydrogen bonding because the amide nitrogen atoms and oxygen atoms are further apart and our the correlated erroneously and hence having no standard secondary structure with the usual α helix, β helix and π-helix like the other proteins. However, due to this SPR1 protein being proline rich, it plays an critical role for the assembly of multi-protein complexes (multimer).

An important element of the host defense system is the anti-microbial peptides which have numerous results on host cells, additionally to their microbicidal properties. The proline-rich anti-microbial have a number of effects on both eukaryotic and prokaryotic cells, such as cathelicidins which is an important poly-proline peptide involved in the skin due to wound repair and hence this peptide is expressed during diseases such as contact dermatitis and psoriasis (Chan et al., 2001). Moreover, PPII helix is associated with lipid destabilization and microbicidal effects of the peptides which are proline rich, including Bac 5, human salivary mucin glycoprotein-2 and PR-39 (Chan et al., 2001).

T cells and antigen presenting cells (APCs) cause the development of the epidermal hyper-proliferation. The potentiation of T-cell activation, hyper-proliferation and accelerated differentiation of keratinocytes is due to the increased levels of inflammatory cytokines which causes lesional psoriatic epidermis. Activated T cells play a critical role in triggering and causing the disease to persist. T cell activation is regulated by the B7 family of molecules on APCs transferring antigen-independent stimulatory signals through CD28 and inhibitory signals through CD152 (cytotoxic T lymphocyte-associated antigen-4 [CTLA-4]) which have a critical role in the mediation of T cell activation as well as suppression (Abrams et al.,1999). Both CD28 and CD152 have a poly-proline motif in the ligand binding region. CTLA4Ig is a chimeric soluble protein which consists of human CD152 which is an extracellular domain and a fragment (which are the CH2, CH3 domains and the hinge) of the Fc portion of human IgG1 (Abrams et al.,1999). The costimulatory signal for T cell activation from CD28 is blocked because of the CTLA4Ig binding to the B7-1 (CD80) and B7-2 (CD86) molecules on the APCs. Therefore, the CTLA4Ig prevents the initiation of the autoimmune processes and also restrains the disease activity of the autoimmune response which occurs late in the course (Abrams et al.,1999). Therefore, this CD28/CD152 pathway is very important in chronic T cell-mediated diseases especially psoriasis (Abrams et al.,1999). Defects in proteins involved in regulating activated T-cell behavior therefore tend to lead to autoimmunity.

Poly-proline helices are multimers. Therefore it can be considered that this small proline rich protein could potentially be a multimer. The only feasible approach is to use comparative modeling to find out known structures with similar sequences. However, due to no GenThreader results, nothing could be done on the structural front, hence lab work is needed to be carried out.


Results for Blast search from UniProt.


Entry name

Protein names



Identity (%)




Psoriasis susceptibility 1 candidate gene 2 protein

Homo sapiens (Human)



7 e-85



Psoriasis susceptibility 1 candidate 2

Homo sapiens (Human)



7 e-85



Psoriasis susceptibility 1 candidate 2

Homo sapiens (Human)



1 e-83



Psoriasis susceptibility 1 candidate gene 2 protein homolog

Pan troglodytes (Chimpanzee)



2 e-81



Putative uncharacterised protein PSORS1C2

Macaca mulatta (Rhesus macaque)



3 e-79



Putative uncharacterised protein SPR1

Sus scrofa (Pig)



4 e-67



Psoriasis susceptibility 1 candidate 2

Bos taurus (Bovine)



4 e-66



Psoriasis susceptibility 1 candidate 2

Sus scrofa (Pig)



1 e-62



Putative uncharacterised protein

Ailuropoda melanoleuca (Giant panda)



6 e-60



Putative uncharacterised protein

Mus musculus (Mouse)



4 e-58




Ornithorhynchus (Duckbill platypus)



1 e-57



Psoriasis susceptibility 1 candidate gene 2 protein homolog

Mus musculus (Mouse)



2 e-57

This table shows that the small proline rich protein is conserved on many species which ranges from close to distant evolutionary relationships. Such as it is present in the chimpanzees which is a close relative to human, but it is also present in the Duckbill Platypus which is a distant evolutionary relationship to humans.

5 iterations of PSI-BLAST were performed. Two of the proteins which match up with this SPR1 protein are Trans-Sialidase (Trypanosoma cruzi strain CL Brener) protein with an E-value of 7e-12 which is better than the threshold and Zinc finger protein 828 (Homo sapiens) which has an E-value of 9e-12 which is also better than the threshold.

TheTrypanosoma cruzi trans-Sialidase (TS) is a unique enzyme which is critical for parasite infectivity, due to the presence of neuraminidase and sialic acid transfer activities, hence causing Chagas' disease. Chagas' disease is a chronic deliberating condition and is caused by the obligate hemoflagellate protozoan parasite Trypanosoma cruzi (T.cruzi) and it is one of the cardinal cause of death in indigenous areas affecting 16-20 million people in the vicinity of Latin America, which is where it is prevalent (Hoft et al., 2007). Infection in humans is caused by an insect vector Triatominae also known as the kissing bugs. When T.cruzi bites the human it defecates on the human skin, mainly around the face area, near the eyes and lips releasing these Triatomine bugs causing a chagoma (swelling and redness) to form. This infected waste is rubbed on the face by the host due to itching, which eventually leads to this psoriatic type condition known as Chagas' disease. The parasites are easily observed in the bloodstream as they propagate all through the mammalian host due to the parasite replication within cells, in the acute phase of the infection (Milla & Kahn, 2000).

It was found to be that purified trans-Sialidase activates the selective release of interleukin-6 (IL-6) which acts as a pro-inflammatory and anti-inflammatory cytokine. It is produced by the T cells and macrophages and it initiates the immune response to trauma predominantly burns or other tissue damage which leads to inflammation. It was observed that in humans who had acute Chagas disease and in the animals which were experimentally infected with T.cruzi, the sera appeared to have invariably elevated levels of IL-6 (Saavedra et al., 1999). Immune and non-immune mechanisms are known to up-regulate the levels of IL-6 in Chagas' disease and this cytokine is also known to regulate the infection of many types of viruses, such as there is an increased production of IL-6 in normal peripheral blood mononuclear cells which have HIV. It has been suggested that in psoriasis, proliferation of keratinocyte and inflammation is due to the mediation of IL-6 (which is the main mediator) in the host's response to tissue injury (Grossman et al., 1989).

Psoriasis is an autoimmune, T cell mediated disorder which is defined by the increased activation of CD4+T lymphocytes and the over expression of pro-inflammatory cytokines, namely interleukin 2 (IL-2), IL-6, gamma interferon (IFN-λ) and tumor necrosis factor alpha suggesting that the development of psoriasis as affected by the immune system are T helper 1 (Th1) mediated. It has been suggested that there is a prevalence of type 1 (Th1 and Tc1) T cell subset associated cytokines, such as interferon IFN-λ and interleukin IL-2 within the psoriatic lesions (Mee et al., 2007). It has also been suggested that the psoriatic process has been initiated and maintained by the cytokines which are released by the activated T lymphocytes, which stimulated keratinocyte proliferation (Jain et al., 2009). Also, a critical role of bacterial super-antigens in the initiation and/or aggravation of psoriasis has been found (Jain et al., 2009). However, the exact procedure in which the T cells activate psoriasis has not yet been found out.

Polyclonal B cell activation is stimulated and non-specific Immunoglobulin (Ig) is induced by T.cruzi TS. The ratio of B cells and T cells are increased by TS and the proliferation of the B cells, deficient in IL-6 or CD40 were stimulated by this trans-Sialidase protein (Gao et al., 2002). CD40 is associated to be an important molecule for the delivery of T cell help during T cell and B cell cognition (Gao et al., 2002).

The second protein is the Zinc finger protein 828, which has a nucleic acid binding structure that is composed of 25-30 amino acid residues. At both extremities of the domain, it consists of two cysteine and histidine residues which form the tetrahedral coordination of a zinc atom. Down below, the schematic representation of the zinc finger domain is shown.

The only form of structure which can be derived from the BLAST results is this Zinc finger motif. No other structures can be predicted.


MUSCLE/ T-coffee/ ClustalW2

These three allogarithms were used to show the multiple sequence alignment of the three proteins namely, the SPR1 protein, theTrypanosoma cruzi trans-Sialidase protein and the zinc finger protein 828. The results from MUSCLE are shown below.

Red letters indicate small and hydrophobic amino acids, including the aromatic ones, blue letters show the acidic amino acids, magenta letters show the basic amino acids, green letters show the hydroxyl and amine and basic amino acids and grey letters show the other amino acids.

MUSCLE results for the two BLAST results namely: TransSialidase protein and the Zinc finger protein along with the SPR1 protein

ClustalW2 results for the BLAST results and the SPR1 protein

Phylogenetic Tree for these results

As it can be seen from the results above, the SPR1 gene is conserved across evolutionary lineages in a variety of species ranging from the closest relative to humans, the chimpanzees, to the distant relative the duckbill platypus.

The phylogenetic tree was created using the BLAST results in order to see if the ancestors of the SPR1 protein depending on how it is related to and what it is related to can predict a structure. However, the ancestors of these proteins also do not have a predicted structure, which shoes that there is something in the family that none of these proteins have a structure.



PSI-PRED Results

The initial results showed the SPR1 protein is almost purely coiled with the exception of β strands, one short one and a long one. Speculation suggests that this protein is proline rich and proline is known as an alpha helix and β sheet breaker, due to several reasons. Firstly, because it cannot participate in helix stabilization through intramolecular hydrogen bonding because it is deprived of an amide proton on an X-Pro (X representing any amino acid) bond (Li et al., 1996). Secondly, proline has a pyrrolidine ring which is capacious and hence there is a steric constraint being placed on the conformation of the previous residue in the helix and thirdly, proline is virtually a polar residue as a secondary amide and in non-periodic structural motifs namely proline-induced γ turns and β turns it manifests an embellished tendency to form strong hydrogen bonds (Li et al., 1996). Likewise, proline is also a β sheet breaker due to the absence of one hydrogen bond donor and also its ϕ angle is inappropriate with the normal β sheets (Li et al., 1996). However, two months later, these results changed to something completely different as shown below.

This above shows the results produced by the algorithm, PSIPRED two months later. This set of results shows that this small proline rich protein is mainly coiled, but it has one long helix. Both these results show the inconsistencies of PSIPRED indicating that it is not a very accurate tool to for secondary structure prediction as mentioned before. Therefore, other tools such as Jpred3, ExPASy, YASPIN and PSORT (WoLF PSORT) were also used to predict the secondary structure to get accurate and consistent results.

The PROF prediction database was used as an alternative to PSIPRED in order to predict the secondary structure and it failed to come up with results. It stated that there is a possibility that there were no prediction because there might have not been any PSIBLAST hits for this sequence. The APSSP2 server was used following the PROF server to predict a secondary structure. This prediction illustrates a small strand followed by a long helix from the 5th to the 16th amino acid and the remaining protein is fully coiled. Results from YASPIN (as shown below) also proves this long helix at this precise point. Therefore, it can be concluded that this small proline rich protein definitely has a long helix. However, the probability of the prediction of it being helical at the point is not very confident on the YASPIN and APSSP2 servers, where as the probability of prediction of it being coiled is very high, but because PSIPRED also came up with a similar prediction with a helix, it could be confirmed that this protein contains a helix from amino acid 5 to 16.


As partially explained above, this results show the SPR1 proteins entire sequence with its secondary structure prediction below and the overall figures below the prediction shows the confidence of the prediction. The figures next to the ‘helix, strand and coil' shows the confidence of prediction for there being a helix, strand or a coil.


DisEMBL Results

DisEMBL was used which is a better disordered protein prediction database than PONDR which is also a protein disorder prediction database, because raw PONDR predictions are restricted.

This graph obtained from the DisEMBL database for the SPR1 protein shows that this protein is predominantly coiled like shown before from the secondary structure prediction by the YASPIN, PSIPRED and APSSP2 servers. Protein disorder is generally found within loops, however the loops and coils itself are generally not disordered. The graph shows that the levels of hot-loops keeps fluctuating and there is a general trend in it increasing after 110 residues. The hot-loops are loops which have high B-factor such as C-α temperature factors (B-factors), have a high degree of mobility. Therefore, it is generally stated that highly dynamic loops are believed to be protein disordered. There are many attempts being undertaken in order to use B-factors for protein disorder prediction (Vihinen et al., 1994). So from the level of hot-loops, it can be seen that this SPR1 protein is disordered. Remark465 entries in PDB shows the missing co-ordinates in x-ray structure. Intrinsic disorder is demonstrated by non assigned electron densities and have been used to predict protein disorder at an early stage (Li et al., 2000). In order to prove this theory, further disorder tests were undertaken and as a result various bioinformatics databases were used.



The signal peptide is usually cleaved off after its destination is reached. So then PSORT (WoLF PSORT) uses this signal peptides to analyze and predict what the input sequence is going to more likely cause a localization to and so the results showed that the localization of the SPR1 protein is in the extracellular matrix.


This graph shows that this small proline rich protein is predominantly unfolded. A circular dichroism spectrum, which is very similar to the unfolded proteins is produced by the PPII helix. Therefore, this has been used to hypothesize that unfolded proteins contains an abundant PPII helical content (Rucker et al., 2002).


ExPASy shows a signal peptide between 19-22 amino acids. and that 30 out of the 136 amino acids are proline's, meaning that this protein is proline rich.

Signal Peptide Prediction Results


These results show that the SPR1 protein has 2 signal peptides, one between 19 and 20 amino acids and the other one between 22 and 23 amino acids.

Signal peptides are generally short peptide chains which consists of between 2-60 amino acids which directs the transport of a protein which are synthesized in the cytosol to a few organelles such as endoplasmic reticulum, nucleus and the mitochondrial matrix.

Results produced by the TransMembrane prediction using Hidden Markov Models (TMHMM) which is shown above, shows that there is an extracellular region due to the high probability, but a very low intracellular region. This small proline rich protein also has no transmembrane domains.

Bronchial Carcinogenesis

Lung cancer is the main cause of death in the United States and 90% of the lung cancer cases in humans is due to cigarette smoking. SPR1 is over expressed in squamous metaplasia of bronchial epithelium (Lau et al., 2000). The occurrence of the squamous metaplasia of the respiratory epithelium is due to either deficiency in vitamin A, chronic tobacco smoke or by exposure to a carcinogen (Lau et al., 2000). The down regulation of SPR1 is caused by vitamin A and the up-regulation is caused by the tumor promoter, phorbol ester which signifies that early neoplastic transformation of the tracheobronchial epithelium and the regulation of the SPR1 expression is closely linked (Lau et al., 2000). In lung cancer, the expression of this small proline rich protein is greatly attenuated or lost leading to the conclusion that this is likely a result of dysregulation of the SPR1 promoter. The over expression of SPR1 acts as a marker for early metaplastic changes and the loss of SPR1 indicates an irrevocable malignant transformation, hence making it a possible intermediate biomarker for the initiation and progression of bronchial malignant transformation (Lau et al., 2000). This small proline rich protein also engages similarly to loricrin and involucrin, and hence plays a protective role for cells in the construction of the cornified cell envelope (Lau et al., 2000).

The loss of the SPR1 gene in bronchial carcinogenesis is affiliated with transcriptional dysregulation. However, it is proven that SPR1 is a marker for initiation and progression of bronchial carcinogenesis (Lau et al., 2000).


Despite all the efforts to get a predicted protein structure for the small proline rich protein and to find out its role in psoriasis, a prediction was obtained for the secondary structure which showed that the SPR1 protein was proline rich and had no proper structure which then suggested that this protein could be a multimer such as collagen which is found in bone which could cause the autoimmune disease, psoriasis.

To predict the structure of this protein, various methods were used such as initially PSI-PRED was used and no proper structure was revealed, then GenTHREADER was performed in order to see if this protein was folded and also if a tertiary structure could be obtained, but no results could be derived. This was finally followed by doing a phylogenetic tree in order to see if a structure could be predicted from the ancestors from the phylogenetic tree, in order to predict a structure for the SPR1 protein, but this also came up with no results. Then a fold index was carried out to see if there were any folds in this protein, and it was found out that this SPR1 protein in majority unfolded. Therefore, it can be concluded that the SPR1 protein has no predicted structure.

However, DomTHREADER, DisEMBL,GlobPlot and DISOPRED showed that the SPR1 protein was a nearly fully disordered protein. The SPR1 protein is also conserved in many species and also it is involved in causing bronchial carcinogenesis. A poly-proline II structure prediction was done, to come up with a predicted structure for the SPR1 protein, but no results were yet again detected.

However, this protein is believed to contribute to the causes of psoriasis due to its presence in the PSORS1 region which is the major determinant in causing psoriasis. This small proline rich protein is present in normal skin and is also highly expressed in psoriatic skin. So due to linkage analysis, it could be revealed that the SPR1 gene is involved in causing psoriasis due to its presence in the main disease causing region.


I would like to thank Dr Robert W Janes and Dan Klose for all the aid provided in doing this work.