This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Molecular modeling is a collection of science and art that study molecular structure and function by using computational technique to generate a realistic model of molecule based on molecular properties and behavior. A special computer graphics application and viewer is required to study the image of molecular structure and chemical process. Normally, molecular modeling is applied in field that related to drug design and computational biology who focus on study of molecular structure, dynamic, properties, biological activity (e.g. protein folding, protein stability, protein recognition, and structure prediction, structure determination), and new molecular system design. This is because modeling can support a systematic way to investigate the molecular structure, flexibility and function.
The computational approach that used in molecular modeling consists of molecular mechanic, quantum mechanics, and molecular simulation. Molecular mechanic is based on empirical result to model the molecular system through motion of an object. For instance, potential energy is calculated by using force field method. Quantum mechanics is described by using wave function and it is important for understand how the atoms are covalently combine to form molecule. Its' example is ab initio and semi-empirical quantum mechanics. Molecular simulation is another computational technique which includes molecular dynamics and Monte Carlo. However, to date, molecular modeling is more concerns to the use of a variety of methods to deduce the atomic information of a system which includes all the approaches mentioned above.
To know more about molecular modeling, we are advised to know some of its history so we can make sense of molecular modeling as we know it today. In 1858, Archibald Scott Couper, Friedrich August Kekulé von Stradonitz, and Aleksandr Mikhailovich Butlerov introduced chemical structure with a structure rule that involves the link of carbons and other atoms. Louis Pasteur modified the molecular structure theory in 1860. In 1865, August Wilhelm Hofmann discovered ball and stick model and color scheme. And Alder and Wainwright performed simulation technique on hard spheres model in 1959. Lifson, Scheraga, Allinger, Levitt, Warshel and others introduced the force field concept in 1960. Whilst in 1970, Rahman and Stillinger described the molecular dynamic simulation of water. In 1971, Protein Data Bank (PDB) was established. In 1977 and 1984, water force field was developed by Berendsen and Jorgensen respectively. At same year, Warshel and colleagues published the concept of protein electrostatics and enzyme-substrate complex. At late 1980s, high speed computer, program, technology innovation, and a variety of algorithm were introduced.
From paragraph above, we only know some important year between late 1850s' to 1980s'. From 1970s' onwards, there is a proposed expectation curve as shown in Figure 1.1. As computational method and technology are more available, the expectation towards biomolecular modeling field was continued to rise especially from 1980s' to 1990s. Structure based rational drug design was then introduced. It was expected to replace the less efficient methods. However, unrealistically disappointment was followed. Then scientists undergo a recovery phase where Human Genome Project was introduced and new technology with fast workstation was published. With these, a steady progress of productivity is emerged until today in order to reach the realistic expectation. In addition, it is important to know that the latest molecular modeling trend is focusing on the development and implementation of virtual reality whose enhance the three dimensional visualization.
Nowadays, molecular modeling has been altered the way to conduct a research, that is choosing the experiment with highest probability of success before perform it. So, there is a demand to have a better and faster program design. It is important to know that molecular model need not be commercial or costly to prepare, inexpensive materials also possible to produce useful model and result.
Question 2: Molecular graphics and molecular viewer
If we said that molecular modeling is creation of computational model based on molecular properties and behavior, then molecular graphic is the graphical depiction of molecular modeling. Molecular graphic is a discipline of study molecule through visualization of molecules and their component parts. It is refers to three-dimensional depictions of molecules that made to examine and understand their response during reaction and interaction. To date, molecular graphic has been replaced some function of physical molecule model as it is portability and easy to interact with analysis result.
In the early stage of molecular graphic, computer graphic tool was dominated by vector graphical representation based on calligraphic technology. During that time, only line and dot were shown. And main frame computer was required to manipulate the molecular structure before submit to graphic hardware. Until the founded of Molecular Graphics Society (known as Molecular Graphics and Modeling Society today) in 1983 and present of Journal of Molecular Graphics, new graphic techniques, hardware device and graphical software were introduced. This discovery allows the interactivity on the basis of space filling molecular model.
As we know that, molecular objects are in three-dimensional structure. So molecular graphic representation (MGR) is concerns to the multi-dimensionality in order to provide more molecular information. Graphical excellence is used as a guideline for the MGR development. It is well designed to present data and it consists of complex idea that able to communicate with clarity, precision, and efficiency. Basically, the graphical display will shows data, induces viewer to have greatest number of idea, makes large data set to be coherent, encourage eyeball technique to compare data by using human eye, integrates statistical and verbal description of a data set, and reveals data at different levels to give more details.
There are several types of molecular model that relate to molecular graphic in order to combine the computational technique with graphic art. Here, I will briefly discuss a few types that commonly used.
Ball and stick model is the most widely used molecular model. It displays the three-dimensional position of atoms and bonds that link between them. Typically, atom is represented by sphere with specific color and bond is represented by rod. The rod can be rotate to provide the insight of bond flexible. The Figure 2.1 shows the proline in ball and stick model and its structural formula. Black color represents carbon, white represents hydrogen, blue represents nitrogen, and red represents oxygen.
Stick model (refer Figure 2.2) is similar to ball and stick model but without ball as it represents all atoms and bonds by using rod with different color scheme.
Space filling model (refer Figure 2.2) is a type of three-dimensional molecular model where the atoms are represented by van der Waals spheres with different color and join directly to one another. Its measurement scale is the same as the real atom' scale. It shows the space that atom occupied instead of shows the chemical bond.
Wire frame model (refer Figure 2.3) is similar to stick model which shows the connection of atom by different color scheme but with thinner bond (connector). This model is convenient for drawing large molecules.
Chicken wire model (refer Figure 2.4) is another method to visualize molecular model by drawing the polygon mesh on surface. Its shape is similar to the regular hexagonal pattern with mesh structure.
Ribbon model (refer Figure 2.5) is a three dimensional schematic molecular model that used to represent protein structure. It shows the path and organization of protein backbone, and serves as visual framework that provides details of atomic structure. Coiled ribbon represents α-helix, arrow represents β-strand, and thin tubes represents loop. It shows the visual basic of molecular structure such as twist and fold.
Molecular viewer is the molecular graphic software that used to visualize molecular structure. It is selected based on the size of molecule and the task to do. Here, I will briefly discuss a few types of viewer that commonly used.
RasMol is a powerful research tool for visualization of protein, nucleic acid, and small molecules. It is easy to use while able to produce high quality three-dimensional image. It is a free viewing system for PDB coordinate files.
Chime is a molecular viewer that modified RasMol code to allow visualization of molecule through web browser. It able to display three-dimensional and interactive molecular model from webpage.
Jmol is a Java-based molecular viewer that allows visualization of molecule through web browser (like Chime) or stand-alone computer (like RasMol). It supports many different source files such as PDB format to create molecular model.
Cn3D is NCBI's three-dimensional structure viewer. It allows the display of three-dimensional structure, sequence, and sequence alignment with annotation and alignment editing feature from NCBI's Entrez.
Swiss PDB Viewer or DeepView is a molecular viewer that provide user friendly interface which allows analyzing several proteins or multiple models at same time. It supports alignment which based on individual residue, main chain, or entire protein. It also can align different homologous protein structure. It can be used as a helper application for browser.
Visual Molecular Dynamics (VMD) is designed for visualization of large structure molecule such as protein and lipid. Except visualization, VMD also can analyze molecular dynamic simulation, act as graphical front end by display and animate molecule, and sequence browsing.
Question 3: Protein structure with reference to hemoglobin and the prion protein
Protein structure is biomolecular structure of protein molecule which made up of amino acid polymers. It is critical to its activity and biological function. X-ray crystallography and NMR spectroscopy are technique that used to determine the protein structure. Basically, the complex protein structure is characterized into four levels of organization (refer Figure 3.1).
Primary structure of protein is a linear sequence of amino acid structural unit that held together by peptide bond. It is starting from amino terminus (N) and end with carboxyl terminus (C). The sequence of amino acid will determines primary structure of protein. Change in a single position of amino acid will alter the function and activity of protein, some might cause disease.
Secondary structure of protein is local conformation of polypeptide chain. It defined by the pattern of hydrogen bond that formed between peptide backbones. The polypeptide folds locally into stable structure will form alpha helix, beta pleated sheet, and turn conformation.
Tertiary structure of protein is a global three-dimensional structure that formed when secondary structures are folded in three-dimensional space. This folding is triggered by hydrophobic interaction and stabilized by hydrogen bond, van der Waal interaction, disulfide bond, and charge-charge interaction. Tertiary structure can be organized by more than one domain. Domain region can be alpha helix, beta sheet or mixed. Tertiary structure refers to each individual domain as well as to the complete configuration of whole protein. Whilst motif is small structure that plays important role in protein prediction.
Quaternary structure of protein refers to regular association of more than two polypeptide chains that fold and coil to form a complex. The interaction between subunits can be identical or different. Quaternary structure is known as a stable three-dimensional structure of multi subunit protein.
Hemoglobin is example of protein quaternary structure that found in red blood cells who acts as oxygen transport molecule. It carries oxygen from lung to release in tissue, and binds to carbon dioxide in tissue then releases back to lung. This process is characterized by cooperative interaction of polypeptide chains (subunit) by change their structure to make hemoglobin to be properly function.
Hemoglobin (Figure 3.2) is an allosteric protein. It is a tetramer that consists of two types of subunits, two α-chains and two β-chains that held together by non-polar interaction and hydrogen bond. Each subunit is arranged as alpha helix structural segment and forms a hydrophobic cleft to allow the attachment of a heme prosthetic group that consists of an iron atom who acts as oxygen binding site. Each of the subunits can carry one molecule of oxygen.
Actually, there has no contact between same type of chain, that is alpha-alpha chain, and beta-beta chain. The contact region occurs between alpha-chain and beta-chain such as alpha1beta1 and alpha1beta2. The alpha1beta2 contact region acts as a switch between deoxy (T) structure to the oxy (R) structure. At T structure or tense state, binding of oxygen is difficult. While oxygen is favored at R structure or relax state as binding of oxygen will trigger the affinity of next oxygen to hemoglobin. Transition from T structure to R structure is triggered by stereo chemical changes at the heme group as shown in Figure 3.3.
Like other proteins, hemoglobin is created by DNA in body. Alteration of amino acid will cause blood related disorder such as sickle cell anemia. This disease is results from a mutation at sixth residue in the β hemoglobin monomer.
Proteinaceous infectious virion (prion) is an infectious protein that causes neurodegenerative disease such as Scrapie and Bovine Spongiform Encephalopathy (BSE) in mammal, and Creutzfeldt-Jakob (CJD) and Kuru in human. Prion is known as an infectious agent that infects protein, without the transfer of nucleic acid genome, and it causes other proteins to be misfolded.
Proteinaceous infectious particle (PrP), a normal cellular protein, consists of around 250 amino acids is found in our body whose involve in the spread of prion disease. Prion theory states that PrP is sole causative agent of prion diseases. If prion is in normal or stable shape (PrPc), it will not cause disease. Whilst if prion is flipped and folded into abnormal conformation (PrPsc), it will cause disease as it induces other proteins to change their conformation and becomes PrPsc as well. This translation is an exponential process where α-helical and coil structures are refolded into β-sheet.
PrPc is the endogenous form of prion protein (PrP), while PrPsc is the misfolded form of PrP. PrPc and PrPsc are known as protein isoform with tertiary structure characteristic. They are categorized as same protein but with different conformation as their folding region is different as shown in Figure 3.4. PrPc contains more α-helical and coil structure and PrPsc has more β-sheet structure.
Prion structure is extremely stable. It is highly resistant to denaturation either by heat, ultraviolet light, or radiation. This characteristic makes prion difficult to be eliminated. Furthermore, prion comes in different strains. Each of the strain has slightly different structure. In other words, there has no effective treatment to fight with prion diseases.
Question 4: Protein databases
Database is an organized collection of data which allows the convenient access of user. Database can be classified into a variety of type. Here, we only focus on the protein data type. There are three types of protein database, that is protein structure database, protein sequence database, and protein motif database. Before discuss the three types of databases, I would like to briefly describe Protein Data Bank (PDB) which is the primary protein database.
Protein Data Bank (PDB) is a worldwide repository of three-dimensional structural data of biological macromolecules. Until March 12, 2013, there are 88837 structures in PDB, in which 82224 are protein structure or 92.56% of PDB structure is protein. All the protein structures are obtained through x-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, cryoelectron microscopy (cryoEM), hybrid technique, and other experimental method.
Since 1971, PDB is under the management of Bookhaven. Until 1999, Research Collaboratory of Structural Bioinformatics (RCSB PDB) takes over this task. In 2003, Worldwide PDB (wwPDB) maintains and formalizes the international collaboration by involve RCSB PDB (USA), PDB Japan (PDBj), PDB Europe (PDBe), and Biological Magnetic Resonance Data Bank (BMRB) as members of wwPDB. They act as distribution centers of PDB data.
PDB acts as a very important resource to organize and share molecule structure data especially in structural biology area. It serves as a global community to allow scientist to share their research. Currently, the newly determined protein structure will be deposited into PDB data before the scientific paper is published. Except three-dimensional database, PDB also known as primary database for protein structure information. Derived or secondary database will use the PDB data and categorize it in a different way based on their own classification.
Protein structure database groups protein based on the similar protein structure and common evolutionary origin. Structural Classification of Proteins (SCOP), CATH, and DALI Domain Dictionary (DDD) databases are the three main protein structure classification databases. But here I will only discuss SCOP and CATH.
SCOP is a repository that organizes protein structure hierarchically based on their structure and evolutionary origin. The classification of SCOP is manually. With assistant of computer tool, SCOP able to visual and compare the protein structure consistency. The latest version of SCOP is 1.75 that released on June 2009 with 38221 PDB entries and 110800 domains. The source of protein structure is from Protein Data Bank.
The classification of SCOP has six levels that are class, fold, superfamily, family, protein, and species. There are eleven classes in SCOP hierarchy which differentiate by fold type: alpha α, beta β, alpha and beta (α/β), alpha plus beta (α+β), multi-domain protein, membrane protein, small protein, coiled coil protein, low resolution structure, peptide, and designed protein. Between the eleven classes, only the first seven classes are known as true class and others serve as place holder for protein domain that have not been classified. Proteins are classified into group based on their structure similarity.
The unit of classification is protein domain. The shape of domain is known as fold. Proteins share common fold if they have same major secondary structures in same arrangement and same topological connections. Superfamily is probable has common evolutionary origin which share common fold and perform similar function. Family shows a clearly evolutionary relationship of proteins with more than 30% sequence identity. Protein level will connect proteins through similar function and structure. The last level, species, is grouped according to unique sequence.
CATH is a semi-automated protein structure classification in which the protein domain is classified based on class (C), architecture (A), topology (T), and homologous superfamily (H). The latest version of CATH is 3.5 that released on September 20, 2011 with 51334 PDB entries, 173536 CATH domains, and 26226 CATH superfamilies.
CATH has four classes, they are alpha, beta, alpha and beta, and few secondary structure. The class of protein domain is determined by its secondary structure composition. Architecture level describes the overall shape of domain structure that determined by secondary structure orientation. Then, domain will be grouped into topology level based on secondary structure shape and connectivity. Homologous superfamily level will then grouped protein domains together based on similar structure, function, and has common ancestor.
Protein sequence database can be divided into manually and automatically annotated database. UniProtKB/SwissProt and Protein Information Resource-International Protein Sequence Database (PIR-PSD) are example of manually annotated database. UniProtKB/TrEMBL and NCBI GenPept are example of automatically annotated database. Here, I will discuss the UniProt Knowledgebase (UniProtKB) as example.
UniProtKB is a protein sequence and functional information database. The data type that captured in is known as protein annotation. UniProtKB consists of two sections: UniProtKB/SwissProt which can be reviewed and UniProtKB/TrEMBL which cannot be reviewed. Most of the sequence in UniProtKB is derived from International Nucleotide Sequence Database (INSD) and some from PDB database. All these sequences will be automatically added into UniProtKB/TrEMBL. Then the TrEMBL records can be manually selected to be integrated into SwissProt record.
UniProtKB/SwissProt is a high curated and non-redundant protein sequence database. The release 2013_03 of March 6, 2013 has 539616 sequence entries. It combines the experimental result, computational analysis, and scientific literature in one entry. It provides all relevant information about the searched protein as the sequence form same gene and same species are merged into same entry. Its file format such as FASTA is downloadable by public.
UniProtKB/TrEMBL is a high quality computational analyze and redundant protein sequence database. That means, TrEMBL is computer annotated supplement to SwissProt but with multiple entries for sam eprotein. The release 2013_03 of TrEMBL on March 6, 2013 has 32153798 sequence entries. However, its entry is not redundant to SwissProt. Therefore, the entry of UniProtKB on release 2013_03 will be the combination entry of SwissProt and TrEMBL that is 32693414.
Protein motif database also known as pattern and profile database is secondary database that derived from conserved pattern obtained from multiple sequence alignment. It is useful for the classification of protein sequence into family level. PROSITE and BLOCKS are example of motif based database.
PROSITE is a protein domain, family and functional site database which comprise of biologically significant protein site, pattern, and profile. It analyzes protein sequence for known motif. PROSITE classification is based on observation. Similar protein sequence will be group into family. And protein domain is classified based on families that share common ancestor or has functional attribute. PROSITE records give the information of structure and function for a particular protein. It is part of the ExPASy proteomics analysis servers and it used the annotation of domain features of SwissProt entry. Up to date, the latest version is release 20.91 of March 4, 2013 with 1661 entries, 1308 patterns, 1053 profiles, and 1057 ProRule.
ProRule is the case rule that provides extra information about function and structural of critical amino acids. For instance, it contains the information related to biologically meaningful residue such as active sites, co-factor-binding sites, and post-translational modification sites. It helps the protein function determination. And it able to generate annotation automatically based on the PROSITE motifs.
PROSITE database is used when identify the possible function of newly discovered protein, and determine activity of known protein. We also can derive a signature or conserved sequence from protein in order to classify protein. This is because each PROSITE signature is linked to an annotation document, where has all related information of the particular protein. PROSITE offers tool that can function for motif detection and protein sequence analysis. The most common application or archive is fingerprint that act as evidence to identify an individual.
Figure 1.1: Expectation curve for biomolecular modeling and simulation
Figure 2.1: Ball and stick model of proline Structural formula of proline
Space filling model
Figure 2.2: Stick model (left) and space filling model (right) of proline
Figure 2.3: Wire frame model
Figure 2.4: Chicken wire model with stick model Figure 2.5: Ribbon model
Figure 3.1: Four levels of protein structure
Diagrammatic representation of the structure of hemoglobin
Figure 3.2: Hemoglobin
Figure 3.3: Conformation transition from T structure to R structure
Figure 3.4: Prion protein in abnormal conformation, PrPsc (left) and prion protein in normal conformation, PrPc (right)