This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Proteins are the important group of biomolecules present in an living organism and are known to perform vital functions of the body. Chemically a protein is a polymer of amino acids, linked by peptide bonds and arranged in a sequential manner. This sequential arrangement of the amino acid gives is referred as its primary structure. The primary structure of a protein is determined by the gene corresponding to the protein. A specific sequence of nucleotides in DNA is transcribed into mRNA, which is read by the ribosome in a process called translation. The sequence of a protein is unique to that protein, and defines the structure and function of the protein.
Structurally, polypeptide chain of a protein has its N= terminal and C terminal which is determined by the linkage pattern between two amino acids. The property of the protein is mainly determined by the type of amino acid present in the primary structure. The sequence of amino acids in a protein/ polypeptide chain is determined by Edman's Degradation and mass spectrometry.
In the recent year's advancement in protein sequencing techniques have generated a large amount of data which is deposited in the databases. The protein sequence databases contain data regarding protein sequences. The deposition of the sequence data in the databases have led to the invention of data analysis tools in bioinformatics. The data analysis tools help in understanding the properties of a particular protein whose sequence is under consideration. Analysis of primary protein sequence / structure also helps in planning the laboratory experiment for the purification, understanding the physical chemical properties of the protein, amino acid composition of protein etc.
Retrieving sequences from database/s
Introduction: The sequences data generated by the high throughput techniques is saved in databases, so that the data is readily available for analysis. The primary set of data stored in primary databases. The sequence data of protein can be downloaded and analysed by various analysis tools.
Exercise: To retrieve the protein sequence data from NCBI's protein database
Goto NCBI http://www.ncbi.nlm.nih.gov/
Select "protein" from the dropdown menu of databases
Type the name of the protein , eg: myoglobin
Click on the link provided
The details of the protein locus , accession number and observe the protein sequence in GenPept format
Click on "FASTA" and observe the sequence in FASTA format
Copy the sequence and paste in a word document OR click on "SEND TO" select the destination as "FILE" and download the sequence in FASTA format and click on "CREATE FILE". Save a file at specific destination for further usage
Result: Silk emitted by the silkworm consists of two main proteins, sericin and fibroin.The sequence of Silk fibroin L-chain was retrieved from NCBI database and details like accession number GenPept format were observed.Its FASTA format is as follows
>gi|19221230|gb|AAL83649.1| silk fibroin [Bombyx mori]
Translation of DNA / RNA sequences into protein
Introduction: Translate tool is the online tool for the translation of DNA / RNA sequences into a protein sequence. The tool is developed by ExPASy (Expert Protein Analysis System) Translation Tool - Swiss Institute of Bioinformatics.
EXCERISE 1: You are provided with a sequence of gene. Translate the gene sequence and find out the protein product.
>gi|50540477|ref|NM_001002706.1| Danio rerio lysozyme g-like 1 (lygl1), mRNA
Go to http://web.expasy.org/translate/
Paste the given sequence of DNA/ RNA in the given slot
Click on translate sequence
Note down the results
Result:The EXPASY tool examines the input sequence in all six possible frames (i.e. reading the sequence from 5' to 3' and from 3' to 5' starting with nt 1, nt 2 and nt 3).The translated gene sequence gives various frames one of those is as follows
5'3' Frame 1
X X X X X X X X X X X X X F S C N H N S T T F S G L T S H S S N I L F C S Q Q L Stop V I Met G I P V I L T Met Y F L A C I Y G D I Met K I D T T G A S E V T A K Q D K L T V K G V E A S K K L A E H D L A R Met E Q Y K S K I L K V A R A K Q Met D P A V I A A I I S R E S R A G A A L K D G W G D H G N G F G L Met Q V D K R Y H K L V G A W D S E E H L T Q G T E I L I G Y I K D I K A K F P T W T K E Q C F K G G I S A Y N A G V K N V Q T Y E R Met D V G T T G G D Y A N D V V A R A Q W F K S K G Y Stop G I N V V Stop C Y F Stop Stop L S L T T D H S F I L Y F V F A G N K Stop N V F I Q K K K K K K K K K K
Finding the isoelectric point and molecular weight
Introduction: Compute pI/MW is a tool calculates the estimated pI and Mw of a specified Swiss-Prot/TrEMBL entry or a user-entered AA sequence. These parameters are useful if you want to know the approximate region of a 2-D gel where a protein may be found.
Exercise: You are given a protein sequence find out the theoretical pI and molecular weight of the sequence.
V I M G I P V I L T M Y F L A C I Y G D I M K I D T T G A S E V T A K Q D K L T V K G V E A S K K L A E H D L A R M E Q Y K S K I L K V A R A K Q M D P A V I A A I I S R E S R A G A A L K D G W G D H G N G F G L M Q V D K R Y H K L V G A W D S E E H L T Q G T E I L I G Y I K D I K A K F P T W T K E Q C F K G G I S A Y N A G V K N V Q T Y E R M D V G T T G G D Y A N D V V A R A Q W F K S K G Y
Go to http://web.expasy.org/compute_pi/
Paste the single letter amino acid sequence of the protein/ upload the sequence from a file/ uniprot Database.
Click on compute pI/MW
Note the results
Result: The theorotical isoelectric point andd molecular weight of the given protein sequence was estimated using Swiss-Prot/TrEMBL to be 9.04 and 21859.19 Da
10 20 30 40 50 60
VIMGIPVILT MYFLACIYGD IMKIDTTGAS EVTAKQDKLT VKGVEASKKL AEHDLARMEQ
70 80 90 100 110 120
YKSKILKVAR AKQMDPAVIA AIISRESRAG AALKDGWGDH GNGFGLMQVD KRYHKLVGAW
130 140 150 160 170 180
DSEEHLTQGT EILIGYIKDI KAKFPTWTKE QCFKGGISAY NAGVKNVQTY ERMDVGTTGG
Theoretical pI/Mw: 9.04 / 21859.19
Study of peptides
Peptide Cutter predicts potential cleavage sites cleaved by proteases or chemicals in a given protein sequence. PeptideCutter returns the query sequence with the possible cleavage sites mapped on it and /or a table of cleavage site positions.
PeptideCutter searches a protein sequence from the SWISS-PROT and/or TrEMBL databases or a user-entered protein sequence for protease cleavage sites. Single proteases and chemicals, a selection or the whole list of proteases and chemicals can be used. Different forms of output of the results are available: Tables of cleavage sites either grouped alphabetically according to enzyme names or sequentially according to the amino acid number. A third option for output is a map of cleavage sites. The sequence and the cleavage sites mapped onto it are grouped in blocks, the size of which can be chosen by the user to provide a convenient form of print-out.
The program accepts the complete input as one single sequence, even if several are entered.
Numbers and space characters are neglected.
If a sequence in FASTA format is entered, the first line is neglected during further steps of the program.
If letters are entered that do not determine an amino acid (B,J,X or Z) the user will be asked for correction.
The program is case insensitive.
Paste the given sequence
Select the enzyme or chemical to be used for the cleavage
Click on perform
Results:Peptide cutter predicetd 9 potential cleavage sites in the given protein sequence by CNBr
10 20 30 40 50 60
MESLKKLFQP VHEKVDETWS KVTIVGVGQV GMAAAFSMLT QNVTNNIALV DMMADKLKGE
70 80 90 100 110 120
MMDLQHGSAF MRNAKIQSST DYSITAGSKI CVVTAGVRQR EGESRLDLVQ RNTDVLKQII
130 140 150 160 170 180
PQLIKYSPDT ILVIASNPVD ILTYVTWKIS GLPKHRVIGS GTNLDSARFR YLLSDRLGIA
190 200 210 220 230 240
TTSCHGYIIG EHGDSSVPVW SAVNIAGVRL SDLNNQIGTD DDPENWKELH ENVVKSAYEV
250 260 270 280 290 300
IKLKGYTSWA IGLSLAQIVR AILTNANSVH AVSTYLKGEH GIEDEVFLSL PCVLSHCGVS
310 320 330
DVIRQPLTEL EVAQLRKSAK VMAKVQNDIK F
The sequence is 331 amino acids long.
Name of enzyme
No. of cleavages
Positions of cleavage sites
1 32 38 52 53 61 62 71 322
Studying of physical and chemical properties of proteins
ProtParam is a tool which allows the computation of various physical and chemical parameters for a given protein stored in Swiss-Prot or TrEMBL or for a user entered sequence. The computed parameters include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity
Go to http://web.expasy.org/protparam/
Enter the sequence provided
Click on compute parameters
Analyse and record the results.
Result: The physical and chemical properties of the given protein sequence were computed by ProtParam. Some of them are
Number of amino acids: 331
Molecular weight: 36362.8
Theoretical pI: 6.76
Top of Form
Amino acid composition:
Ala (A) 23 6.9%
Arg (R) 13 3.9%
Asn (N) 15 4.5%
Asp (D) 19 5.7%
Cys (C) 4 1.2%
Gln (Q) 14 4.2%
Glu (E) 16 4.8%
Gly (G) 22 6.6%
His (H) 9 2.7%
Ile (I) 25 7.6%
Leu (L) 31 9.4%
Lys (K) 21 6.3%
Met (M) 9 2.7%
Phe (F) 6 1.8%
Pro (P) 9 2.7%
Ser (S) 29 8.8%
Thr (T) 19 5.7%
Trp (W) 5 1.5%
Tyr (Y) 8 2.4%
Val (V) 34 10.3%
Pyl (O) 0 0.0%
Sec (U) 0 0.0%
(B) 0 0.0%
(Z) 0 0.0%
(X) 0 0.0%
Bottom of Form
Total number of negatively charged residues (Asp + Glu): 35
Total number of positively charged residues (Arg + Lys): 34
Carbon C 1609
Hydrogen H 2603
Nitrogen N 443
Oxygen O 487
Sulfur S 13
Total number of atoms: 5155
Extinction coefficients are in units of M-1 cm-1, at 280 nm measured in water.
Ext. coefficient 39670
Abs 0.1% (=1 g/l) 1.091, assuming all pairs of Cys residues form cystines
Ext. coefficient 39420
Abs 0.1% (=1 g/l) 1.084, assuming all Cys residues are reduced
The N-terminal of the sequence considered is M (Met).
The estimated half-life is: 30 hours (mammalian reticulocytes, in vitro).
>20 hours (yeast, in vivo).
>10 hours (Escherichia coli, in vivo).
The instability index (II) is computed to be 15.96
This classifies the protein as stable.
Aliphatic index: 102.72
Grand average of hydropathicity (GRAVY): -0.028
Peptide Primary structure
Introduction: PepDraw is a tool that was developed to facilitate the study of the chemical structure and properties of peptides. It allows users to draw the primary chemical structure of an amino acid sequence and predict some chemical properties such as mass, charge, and hydrophobicity. PepDraw was designed to be a powerful yet user-friendly tool for peptide analysis. It is especially useful for teaching students about the structure and properties of the amino acids.
Paste the given sequence
Click on draw peptide
Record the properties.
Result: The peptide structure properties and its properties analysed using PepDraw is as follows
Isoelectric point (pI):
+46.33 Kcal * mol -1
4595 M-1 * cm-1
4470 M-1 * cm-1
Peptide structure image:
Random protein sequence generation
Introduction: RandSeq is a tool which generates a random protein sequence. One can use equal amounts of amino acids to generate a random sequence or can use specific amount of amino acid percentages. The tool generates random protein sequences which can be analyzed using different tools.
Select the parameters/ composition of each amino acid
Click on submit
Analyse the results
Result: Random protein sequence generated having equal composition of all amino acids to be analyzed further is as follows
Virtual Sequence: RND29006
ID RND_29006 Unreviewed; 200 AA.
DE Randomly generated sequence, created by ExPASy WWW server tool
DE RandSeq for 18.104.22.168.
CC -!- MISCELLANEOUS: This sequence was generated using equal composition for all amino acids.
SQ SEQUENCE 200 AA; 23795 MW; FED65773033E0235 CRC64;
WFWYDMPEME QDMDSKQVYM GRGKDDIICT INNRYPAFHC LNCPNMQMTE NNRFGRCRDS
TLWWSQHASA NCPQMYRCKP NGEAHIWEEY VCNWTWKKIK GFPGMVYKIP WPDHSITLFI
DMELGLQCLT KSSHAFPLMV PFARGHYETS WHHGYCQVGT VVDQFAWSQQ TCFEAHVIFI