Quantitative Structure Acitivity Relationship Biology Essay

Published:

CHAPTER 6

Bioinformatics is crucial role player in structure based drug and target discovery, diagnosis and analysis of various diseases and their diversity. In particular there is enormous potential of its application in cancer research, which has only been partially exploited so far. Essentially all bioinformatics starts with a database and proceeds to some kind of knowledge discovery and prediction. Bioinformatics databases and different types of quantitative structure-activity relationship (QSAR) studies, which have either been used in cancer research or have the potential of such application.

Once the molecular mechanism and the chemistry of a disease is understood, the next crucial task is to find a suitable cure for it. A typical requirement is to find a suitable drug target and the drug itself (Brooijmans and Kuntz 2003). Target discovery draws much on bioinformatics tools today and in case of cancer the DNA/RNA and protein molecules both can be potential targets for drugs (Choudhary et al. 2005; Bandyopadhyaya et al. 2005; Bhongade et al. 2004; Asseffa et al. 2003; Gellert et al. 2005; Khaleque et al. 2006; Yao et al. 2005; McColl et al. 2005).

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

Drug discovery is a complex, expensive and very time-consuming exercise, as there is no single systematic way to automatically discover a drug even when the disease and targets have been well understood (Dixit and Mitra 2002). Quantitative structure-activity relationship (QSAR) studies form the center stage when a protein (typically an enzyme) is the target and there is need to find a suitable molecule, which can control (inhibit) the activity of its target. The basic principle of such a study is the structure dependence of chemical activity. QSAR has existed much longer than the first popularity of computers, because chemical structure has always been able to explain at least some aspects of chemical properties or biological activity. However, with the availability of powerful computers and high quality databases of molecular libraries and interactions have made QSAR an essential component of drug discovery today.

QSAR based (in-silico) analysis may be better regarded as an exercise to screen or filter drug candidates, before they are subjected to more intensive calculations such as docking or an experimental measurement of activity (in-vitro) and finally under real conditions (in-vivo). Many times this step will pick up a dozen of drug candidate from a library of millions of well-studied molecules. Traditional QSAR is specific to a particular target or enzyme and all the screening is performed on drug candidates (ligand molecules). These ligand molecules are very diverse and in order to screen them suitably, we need to describe their structure as well as chemical nature. This leads to the issue of finding descriptors of molecular properties of ligands and drugs.

Hundreds of molecular properties or descriptors are used to represent molecules (Labute 2000; Xue and Bajorath 2000; Wildman and Crippen 2002; Gozalbes et al. 2002). These properties may be purely geometric, topological, electromagnetic, classical and quantum-mechanical. Often, predicting activity of a protein-ligand combination if the descriptors of the ligand are known carries out this screening. Regression techniques such as Principal Component Analysis (PCA), Neural Network and Multivariate correlation are the major techniques used for this purpose.

A large number of molecular descriptors are available and used (Todeschini and Consonni; Labute 2000; Wildman and Crippen 2002; Hansch et al. 1995; Basak et al. 1980; Gozalbes et al. 2002; Pirard and Picket 2000; Basak et al. 1981; Basak et al. 1982; Kier and Hall 1999; Raevsky 1999; Xue and Bajorath 2000). Molecular descriptors used in QSAR for a unique representation and identification of ligand molecules, which are likely to be drug candidates, may be classified as follows: Constitutional descriptors such as molecular weight, van der Waals volume, electronegativities, polarizability, number of atoms, non-H atoms, number of H bonds, multiple bonds, bond orders, aromatic ratio, number of rings, number of double and triple bonds, aromatic bonds, 3 different types of (n-membered ) rings, benzene-like rings.

Topological descriptors such as total structure connectivity index, Pogliani index, ramification index, polarity number, average vertex distance degree, mean square distance index (Balaban), Schultz Molecular Topological Index (MTI), square reciprocal distance sum index, quasi-Wiener index (Kirchhoff number), spanning tree number, hyper distance path index, reciprocal hyper-distance-path index, detour index, hyper-detour index, reciprocal hyper-detour index, distance/detour index, all-path Wiener index, Wiener-type index from Z weighted distance matrix (Barysz matrix),molecular electrotopological variation, E-state topological parameter, Kier symmetry index eccentricity, mean distance degree deviation, unipolarity, centralization, variation.

6.2 Descriptors for ligands selection:

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

Geometrical descriptors 3D-Wiener index, 3D-Balaban index, 3D-Harary index average geometric distance degree, D/D index, average distance/distance degree gravitational index G1, gravitational index G2 (bond-restricted), radius of gyration (mass weighted), span R, average span R.

Charge descriptors maximum positive charge,maximum negative charge, total positive charge, total negative charge, total absolute charge (electronic charge index - ECI), mean absolute charge(charge polarization), total squared charge, relative positive charge, relative negative charge, sub molecular polarity parameter, topological electronic descriptor, topological electronic descriptor (bond resctricted), partial charge weighted topological electronic descriptor, local dipole index.

Molecular properties unsaturation index hydrophilic factor Ghose-Crippen molar refractivity topological polar and non-polar surface area. Many free and commercial software also provide a current list of descriptors (e.g. http://www.talete.mi.it/products/dragon_molecular_descriptors.htm & http://preadmet.bm-drc.org/preadmet/query/query1.php from where, list of many of the above descriptors is compiled.). An excellent coverage of issues and topics related to QSAR is also provided in a text book by Gasteiger and Engel (2003).

The online softwares have been listed in table 6.1.1.

Table 6.1.1: Online QSAR and molecular descriptor programs

6.1.2 Types of QSAR models:

There are many advanced techniques of QSAR such as Comparative Molecular Simillarity Index Analysis (CoMSIA) have been used to study antiviral and anticancer drug Principle of CoMSIA is the alignment and comparison of drug molecules by comparing their similarity indices (selected descriptors). A similar approach, called Comparative Molecular Field Analysis (CoMFA) focuses on molecular field descriptors for this purpose (Cramer et al. 1988).

Quantitative models:

Linear Models

The correlation of biological activity with physicochemical properties is often termed an extra thermodynamic relationship. Because it follows in the line of Hammett and Taft equations that correlate thermodynamic and related parameters, it is appropriately labeled. The first studies that use QSAR notions to explain the biological activity of sets of compounds were published by Kopp, 1844; Crum-Brown and Frazer, 1868-69; Meyer, 1899 and Overton, 1901. The Hammett equation represents relationships between the logarithms of rate or equilibrium constants and substituent constants. The linearity of many of these relationships led to their designation as linear free energy relationships. The Hansch approach represents an extension of the Hammett equation from physical organic systems to a biological milieu. It should be noted that the simplicity of the approach belies the tremendous complexity of the intermolecular interactions at play in the overall biological response. Biological systems are a complex mix of heterogeneous phases. Drug molecules usually traverse many of these phases to get from the site of administration to the eventual site of action. Along this random-walk process, they perturb many other cellular components such as organelles, lipids, proteins, and so forth. These interactions are complex and vastly different from organic reactions in test tubes, even though the eventual interaction with a receptor may be chemical or physicochemical in nature. Thus, depending on the biological system involved-isolated receptor, cell, or whole animal-one expects the response to be multi factorial and complex. The overall process, particularly in vitro or in vivo, studies a mix of equilibrium and rate processes, a situation that defies easy separation and delineation.

Meyer and Overton were the first to attempt to get a grasp on biological responses by noting the relationship between oil/water partition co-efficient and their narcotic activity. Ferguson recognized that an equitoxic concentration of small organic molecules was markedly influenced by their phase distribution between the biophase and exobiophase. This concept was generalized in the form of Equation 6.1 and extended by Fujita to Equation 6.2 (Janssen, 1960; Fujita, 1990). C = 5 kAm (6.1)

Log 1/C 5 = m Log (1/A) + constant (6.2)

C represents the equipotent concentration, k and m are constants for a particular system, and A is a physicochemical constant representative of phase distribution equilibria such as aqueous solubility, oil/water partition coefficient, and vapor pressure. In examining a large and diverse number of biological systems, Hansch and coworkers defined a relationship (Equation 6.3) that expressed biological activity as a function of physicochemical parameters (e.g., partition coefficients of organic molecules) (Hansch, 1995).

Log 1/C 5 = a log P + b (6.3)

Model systems have been devised to elucidate the mode of interactions of chemicals with biological entities.

Nonlinear Models

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

Extensive studies on development of linear models led Hansch and coworkers to note that a breakdown in the linear relationship occurred when a greater range in hydrophobicity was assessed with particular emphasis placed on test molecules at extreme ends of the hydrophobicity range. Thus, Hansch et al suggested that the compounds could be involved in a random-walk process: low hydrophobic molecules had a tendency to remain in the first aqueous compartment, whereas highly hydrophobic analogs sequestered in the first lipoid phase that they encountered. This led to the formulation of a parabolic equation, relating biological activity and hydrophobicity (Penniston, 1969).

6.3 Lead compounds and their QSAR study

An important step in drug design is to find a lead, a compound that binds to the target receptor. Leads can be generated using techniques of de novo drug design or can be discovered by in vitro screening of large corporate libraries. We have to mention that the lead identification is only the beginning of a long and expensive process that eventually yields a commercial drug. A lead may have a low affinity for the target receptor, may be too unstable in solution, too toxic, too rapidly eliminated, too quickly metabolized, too difficult or too expensive to synthesize in large quantities. Because the screening procedures generally give leads that are not suitable as commercial drugs, these compounds have to be optimized using various techniques of computer-aided drug design. The availability of three-dimensional (3D) structural information of biological receptors and their complexes with various ligands can be extremely useful in suggesting ways to improve the affinity of the lead to the target. Because in many cases such detailed structural information is still unavailable, the drug design process must rely upon a more indirect approach, the quantitative structure-activity relationships (QSAR) approach.

The objective of Lead compounds identification for the improved inhibition of oral cancer and cervical cancer common target and to find out the best optimized lead compound for E6, oncoprotein inhibition was completed by the QSAR (2D) study table 6.3.1.

Table 6.3.1 Ligands selected for QSAR study of ?

S. No.

Ligands for QSAR study

L1 Eugeol

L2 Di benzoyl methane

L3 Zingerone

L4 yakuchinone A

L5 Ferulic Acid

L6 caffeic acid

L7 Iso Eugenol

L8 curcumin

L9 capsaicin

L10 Cholorogenic acid

L11 Quercetin

L12 Bis demethoxycurcumin

L13 DehydroZingerone

L14 Piperic acid

L15 curcumin dipiproyl ester

L16 cassumunin B

L17 cassumunin A

Software / Programm used for QSAR

Schrödinger Suite : For the analysis of 2D- QSAR study

Maestro 9.2: For the analysis of docking study

QikProp 3.4.111 : For the calculation of descriptors for ligands

Strike 2.0 : For the finding out the activity of ligands

2D QSAR

Descriptor selection and linear model for study

A data set of 17 compounds was used to perform QSAR studies. Chemical structure of the molecule is usually represented by variety of descriptors. A set of 46 descriptors was calculated using Schrodinger program version quikprop 9.2. All the descriptors were scaled in order to remove the dominance of descriptor with higher numerical values on the descriptor with small numerical values. Scaling was carried out using following equation

Where x represents a particular descriptor's value.

Data was divided into training and test set in 75:25 ratios with the aim of having diverse compounds in the training set. Therefore correlation matrix was constructed between compounds and highly diverse compounds were identified, thereby having 12 compounds in training set and 5 in test set. Compounds number L6 caffeic acid, L7 Iso Eugenol, L10 Cholorogenic acid, L16 cassumunin B, L17 cassumunin A belonged to test set.

First Method:

Each compound in training set and test set has 46 descriptors each. In order to make a stable and interpretable model, relevant descriptors should be selected in QSAR analysis. Feature selection is carried out to reduce the dimensionality of the data by removing unsuitable descriptors and improving the learning process. A correlation matrix was constructed where the correlation of each descriptor with other descriptors as well as the biological activity was determined. Descriptors having low correlation with biological activity were discarded from matrix. Top 10% descriptors were selected thereby reducing the number of descriptors to four (4). The selected descriptor set (4 descriptors, namely #rotor, PISA, QPlogPC16, EA (ev)) was then subjected to multiple linear regression (MLR) analysis which produced a linear model having correlation coefficient (r2) as 0.36.

Table 6.3.2: Training set for first method (2D-QSAR)

Molecule

EC50

#rotor

PISA

QPlogPC16

EA(eV)

L1 Eugeol

2.3

-0.92004

-0.62763

-1.06173

-1.483377512

L2 Di benzoyl methane

0.11

-0.92004

1.997265

-0.3906

0.304758097

L3 Zingerone

38

-0.70281

-1.18952

-0.90502

-1.685719172

L4 yakuchinone A

0.96

0.383349

1.02181

0.284818

-1.542197762

L5 Ferulic Acid

10.6

-0.70281

-0.78752

-0.81729

0.627093068

L8 curcumin

0.96

0.817812

0.476086

0.570407

0.657679598

L9 capsaicin

0.2

0.383349

-1.35551

-0.09348

-1.598665202

L11 Quercetin

10.8

-0.70281

0.269988

0.036538

0.351814297

L12 Bis demthoxycurcumin

0.63

0.383349

1.74827

0.405566

0.264760327

L13 DehydroZingerone

3.55

-0.70281

-0.79751

-0.89259

0.511805378

L14 Piperic acid

1.69

-0.70281

-0.2451

-0.76415

1.297643921

L15 curcumin dipiproyl ester

0.0026

0.817812

0.363282

1.035083

0.812965059

Table 6.3.3: Test set for first method (2D-QSAR)

Molecule

EC50

#rotor

PISA

QPlogPC16

EA(eV)

L6 caffeic acid

15.3

-0.70281

-0.67666

-0.79468

0.516510998

L7 Iso Eugenol

7.5

-1.13727

-0.77338

-1.09813

-0.876352529

L10 Cholorogenic acid

10.8

0.383349

-0.71253

0.444458

0.91884151

L16 cassumunin B

19

2.1212

0.397034

2.043577

0.467101987

L17 cassumunin A

19

1.903969

0.891618

1.997223

0.455337937

Second method:

To reduce number of features, WEKA (Waikato Environment for Knowledge Analysis) program was used. In this study, correlation based feature selection method greedy stepwise search methods A 10 fold cross validation is used in the learning process; this performs variable selection for each cross validation fold. The selected descriptor set (6 descriptors, namely #stars, #rotor, PISA, QPlogPo/w, CIQPlogS, EA(eV)) was then subjected to multiple linear regression (MLR) analysis which produced a linear model having correlation coefficient (r2) as 0.5207. The influence of each descriptor was studied by removing them one by one. Descriptors having small or negligible drop in the correlation coefficient were eventually removed, thereby a set of five descriptors (#stars, #rotor, PISA, QPlogPo/w, EA (eV)) were finally selected having correlation coefficient (r2) as 0.579 on complete dataset.

Table 6.3.4: Training set for second method (2D-QSAR)

Molecule

#stars

#rotor

PISA

QPlogPo/w

EA(eV)

EC50

L1 Eugeol

0.517409

-0.92004

-0.62763

0.063838

-1.48338

2.3

L2 Di benzoyl methane

-0.58209

-0.92004

1.997265

-0.04958

0.304758

0.11

L3 Zingerone

-0.58209

-0.70281

-1.18952

-0.47447

-1.68572

38

L4 yakuchinone A

-0.58209

0.383349

1.02181

1.214142

-1.5422

0.96

L5 Ferulic Acid

-0.58209

-0.70281

-0.78752

-0.6754

0.627093

10.6

L8 curcumin

-0.58209

0.817812

0.476086

0.094352

0.65768

0.96

L9 capsaicin

-0.58209

0.383349

-1.35551

0.410426

-1.59867

0.2

L11 Quercetin

-0.58209

-0.70281

0.269988

-1.27185

0.351814

10.8

L12 Bis demthoxycurcumin

-0.58209

0.383349

1.74827

-0.00698

0.26476

0.63

L13 DehydroZingerone

-0.58209

-0.70281

-0.79751

-0.49519

0.511805

3.55

L14 Piperic acid

-0.03234

-0.70281

-0.2451

-0.23324

1.297644

1.69

L15 curcumin dipiproyl ester

-0.03234

0.817812

0.363282

0.215254

0.812965

0.0026

Table 6.3.5: Test set for second method (2D-QSAR)

Molecule

#stars

#rotor

PISA

QPlogPo/w

EA(eV)

EC50

L6 caffeic acid

-0.58209

-0.70281

-0.67666

-1.17743

0.516511

15.3

L7 Iso Eugenol

0.517409

-1.13727

-0.77338

0.149621

-0.87635

7.5

L10 Cholorogenic acid

-0.03234

0.383349

-0.71253

-1.65183

0.918842

10.8

L16 cassumunin B

2.716399

2.1212

0.397034

1.955679

0.467102

19

L17 cassumunin A

2.166652

1.903969

0.891618

1.93265

0.455338

19

#stars: Number of property or descriptor values that fall outside the 95% range of similar values for known drugs. Outlying descriptors and predicted properties are denoted with asterisks (*) in the .qpsa file. A large number of stars suggest that a molecule is less drug-like than molecules with few stars.

#rotor: Number of non-trivial (not CX3), non-hindered (not alkene, amide, small ring) rotatable bonds)

PISA: π (carbon and attached hydrogen) component of the SASA (Total solvent accessible surface area)

QPlogPo/w: Predicted octanol/water partition coefficient.

6.6 Conclusion:

The two methods used for this results analysis Data was divided into training and test set in 75:25 ratios with the aim of having diverse compounds in the training set. Therefore, correlation matrix was constructed between compounds and highly diverse compounds were identified, thereby having 15 compounds in training set and 5 in test set. Compounds Eugenol (3314), Curcumin (969516), Quercetin (5280343), Piperic Acid (5370536), Curcumin di piporyl ester (6441419), belonged to test set.

Multiple linear regression (MLR) analysis produced a linear model having correlation coefficient (r2) as 0.7956. This shows significant results for the In silico interaction data analysis for the selected molecules.

----------------------------------------------------- END OF CHAPTER-6 ----------------------------------------------