This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Abstract Microtubules play an important role in intracellular transport, mobility, and particularly mitosis. Paclitaxel (Taxolâ„¢) and paclitaxel-like compounds have been shown to be anti-tumor agents useful against various human tumors. Paclitaxel-like compounds operate by stabilizing microtubules via binding at the interface between two ï¢-tubulin monomers in adjacent protofilaments. We present in this study the elucidation of the structural features of paclitaxel and paclitaxel-like compounds (e.g., epothilones) with microtubule stabilizing activities, and relate their activities to spatial and chemical features of the molecules. CATALYST was used to generate three-dimensional quantitative structure activity relationships (3D-QSARs) resulting in 3D pharmacophore models (hypotheses) of epothilone- and paclitaxel-derivatives. Pharmacophore models were generated from diverse conformers of these compounds resulting in a high correlation between experimental and predicted biological activities (r = 0.83 and 0.91 for epothilone and paclitaxel derivatives, respectively). On the basis of biological activities of the training sets, five- and four-feature pharmacophore hypotheses were generated. The validation of generated hypotheses was achieved by using twelve epothilone derivatives and ten paclitaxels, respectively, which were not in the training sets. A clustering (grouping) and merging technique was used in order to supplement spatial restrictions of each of hypothesis and to develop more comprehensive models.
This approach may be of use in developing novel inhibitor candidates as well as contributing a better understanding of structural characters of many compounds useful as anticancer agents targeting microtubules.
Keywords: microtubule; 3D-QSAR; pharmacophore; hypothesis; CATALYST; epothilone; paclitaxel; drug design
Tubulin,[  ] a heterodimeric protein composed of globular Î±- and Î²-tubulin subunits, represents the monomeric building block of microtubules. Tubulin polymerizes end to end creating a protofilament comprised of non-covalently interacting Î±- and Î²-tubulin subunits (â€¦Î±/Î²-Î±/Î²-Î±/Î²â€¦). Thirteen protofilaments come together laterally forming a closed tube, called a microtubule. Microtubules are essential for many cellular processes, such as rigidifying the cell and translocation of vesicles, granules, organelles, and chromosomes through special attachment proteins.[  -,  ,  ,  ,  ] The process of cell division makes use of microtubules through their ability to form the mitotic spindle. This results from the dynamic (dis)assembly process of depolymerization and repolymerization.[  ] Microtubules are "dynamically unstable" - tubulin heterodimers continuously attach and detach from the protofilament termini. Detached tubulin heterodimers can be used to form the mitotic spindle, required for mitosis. Rapidly dividing cells, e.g., bone marrow, skin, and carcinogenic, are most sensitive to interference of the dynamic instability of microtubules.
Many natural products such as colchicines,[  ] rhizoxin,[  ] vinblastine,[  ] and vincristine[  ]bind tubulin, blocking microtubule assembly, thereby interfering with the production of the mitotic spindle by destabilizing microtubules. In contrast, paclitaxel[  -,  ,  ] and epothilones[  -,  ,  ] hyperstabilize microtubules by binding to ï¢-tubulin/ï¢-tubulin interfaces in adjacent protofilaments resulting in a reduced ability to form the mitotic spindle due to a diminished pool of disassociated tubulin heterodimers.[,  ] The most well known tubulin-stabilizing agent is paclitaxel (Taxolâ„¢),[  ] which is currently a front-line anticancer agent. However, paclitaxel is not an ideal drug because of its susceptibility to Multi-Drug Resistance (MDR) and its low aqueous solubility, leading to difficulties in administration/formulation and toleration.[  -,  ,  ,  ,  ]
Recently, several non-taxoid compounds including epothilones have been shown to stabilize microtubules. Epothilone A and B, which are more soluble than paclitaxel, can tightly bind to the Î²-tubulin subunits and inhibit cell division by hyperstabilizing microtubules so that they can not readily dissociate to form the mitotic spindle.[  -,  ,  ,  ,  ,  ,  ] Even though a number of selective microtubule-stabilizing agents have recently described and their high-affinity binding sites have been detected on the tubulin protein, their complexed molecular structures are not fully understood except X-ray crystal structure of taxotere.[  ]
Therefore, in order to better understand the pharmacophore elements for epothilones and paclitaxels, and to provide guidelines for creating new drug candidates, a three-dimensional quantitative structure-activity relationship (3D-QSAR) approach, as implemented in CATALYST,[  ] was employed in this study. The aim of this study is the generation of hypotheses (3D-pharmacophore models) in which experimental biological activities are correlated with the 3D-arrangement of chemical features of the compounds. CATALYST is used to generate hypotheses using the 3D distribution and combination of structural features. These hypotheses are used to predict the biological activities of test sets of compounds not included in the training sets in order to allow the rank ordering of synthetic priorities. In addition, they were used as a 3D-query to search compound databases, such as, NCI and ACD in order to identify new candidate drugs.[  ]
Materials and Methods
In order to build a 3D-QSAR model, computational studies were carried out using CATALYST version 4.6 on a Silicon Graphics O2 and Octane workstations.
Selecting Training sets for Epothilone and Paclitaxel Derivatives
In order to generate pharmacophore models of these microtubule anticancer agents, our training sets were divided into two representative series: one for epothilones and one for paclitaxels. For the epothilone derivatives, a training set of 37 compounds, evaluated for cytotoxicity against an ovarian cancer cell line (1A9), was constructed using the SKETCHER module in CATALYST. Each member of the training set was energy minimized to the closest local minimum using the CHARMm forcefield.[  ] For the paclitaxel derivatives, the ï¢-tubulin complex with docetaxel (PDB code: 1TUB)[  ] was used as the initial template structure for the training set of 25 compounds targeting non-small cell lung carcinoma (A549). The structure of the ï¢-tubulin/docetaxel complex was minimized using the CVFF (Consistent Valence Forcefield) forcefield with DISCOVER_3/InsightII.[  ] The minimized docetaxel structure was replaced with paclitaxel and minimized as described above.
Generating Pharmacophores with CATALYST
In the CATALYST program, all conformers of each member of the training sets were generated using the "Best generation method" of the ConFirm module, which created a maximum of 255 conformations within an internal energy range of 20 kcal/mol for each molecule.[,  ] These conformers were built by using the Poling algorithm[-  40]. Hypotheses, ten in total, were generated after considering features (chemical functions) such as hydrogen-bond donors (HBDs), hydrogen-bond acceptors (HBAs), and aliphatic and aromatic hydrophobic (ali-P and aro-P).[  -,  ,  ,  ]
The predictive 3D-pharmacophore models were developed with the HypoGen module in CATALYST, which generated top ten scoring hypothesis and consisted of three phases: (1) the constructive phase in which the hypothesis is generated with the most active compounds, (2) the subtractive phase, in which some of the hypotheses fitting the inactive or weakly active compounds are removed, and (3) the optimization phase, in which all hypotheses are subjected to perturbations such as adding a new feature or deleting an existing feature, and rotating the vectors attached to the features. HypoGen to investigate the trends and variations in biological activity with the chemical structure through investigating the 3D arrangement of the chemical features in the training set compounds.
The parameters for generating hypotheses were kept at the following values: weight variation, 0.302; mapping coefficient, 0; spacing, 2.97 Å; activity uncertainty, 3; and tolerance factor, 1. The weight variation represents the expected standard deviation of the feature weight for good hypothesis. The mapping coefficient is a penalty for hypotheses for which training set molecules that are topologically similar map in different ways to the hypothesis. The default value of 0 means that it is not used in hypothesis generation. The spacing indicates the minimum distance between actual feature of hypothesis. An uncertainty of 3 means that biological activity is existed between activity/3 and activity*3. The tolerance factor is used to select the variable tolerance mode of HypoGen. The default value of 0 means that it uses standard HypoGen mode with no variable tolerance. Setting this value to 1 allows individual feature tolerance during optimization.
Analysis and Refinement of Generated Pharmacophores
The quality of the generated hypotheses was evaluated by considering the cost factor analysis for determining the best hypothesis with the appropriate conformers after generation of al ten hypotheses. The lowest cost hypothesis (1st hypothesis in the list) was considered to be the best due to the nature of the cost factor analysis method (i.e. the highest cost difference, lowest RMS error, and the best correlation coefficient) and is acknowledged to possess the features representative of all of the hypotheses. After generating hypotheses, the three-term cost factors were analyzed.[  ] All cost factors of a hypothesis were calculated by summing these three component cost factors. The units of cost are binary bits. The hypothesis costs are calculated according to the number of bits to describe a hypothesis. Simpler hypotheses require fewer bits for a complete description, and this assumption is made that simpler hypotheses are better.
cost = eE + wW + cC (1)
The weight cost (W) represented in a Gaussian form is expressed by the feature weight deviated from an idealized value of 2. The configuration cost (C) is described by the complexity of the hypothesis in the form of the exponent of the entropy. The error cost (E), which acts major role in decision of overall cost of a hypothesis among three costs, is computed from the deviation between the predicted and experimental activities of the training set compounds.
Two important theoretical hypotheses, the fixed cost hypothesis and the null hypothesis, were computed to assist in the evaluation of each of the ten hypotheses. The fixed cost hypothesis (Hfixed), in which the error cost is minimal and the slope of the activity correlation line was one, was a lower bound on the cost of the simplest possible hypothesis that still fitted to the data perfectly.
fixed cost = eE(x=0) + wW(x=0) + cC (2)
The null cost hypothesis (Hnull) presumes that there is no statistically significant structure in the data and that the experimental activities are normally distributed about their mean. The error cost in the null hypothesis is higher than that in the fixed cost hypothesis and the slope of the activity correlation line in the null hypothesis is zero. Hnull represents the highest cost of a pharmacophore model with no features and estimates the activity to be the average activity of the training set molecules.
null cost = eE(xest=) (3)
The measure of the difference between a null and a fixed cost hypothesis is important in the analysis of the results of the hypothesis generation. The greater the difference in the cost between two hypotheses, the higher the probability for finding useful models. One point that has to be emphasized is that the features of the generated hypothesis may be edited manually by distance constraint, angle constraint, etc. A meaningful hypothesis may result when the difference between two values is large; a value of 40-60 bits for a hypothesis may indicate that it has 75-90% probability for correlating the data.
Validation of Generated Pharmacophores
In order to validate pharmacophore models generated from the training set of compounds, we applied the models to test sets of compounds (i.e. ones that were not included in the initial training set) in order to judge whether the ranked order of activities is accurately predicted for the test sets of compounds. To test how well the compounds matched a proposed hypothesis, we used the 'Compare/Fit' function that evaluates the geometry of the conformers of the compounds. The Fit function is expressed by a numerical value that quantifies the fitting capacity between each compound and generated hypothesis. Perfect mapping indicates that a fit value can be expressed by the sum of the weight of the features in the hypothesis. A higher fit value represents a better fit between the functional groups of the molecule to the appropriate features of the pharmacophore.
where w = adaptively determined weight of the hypothesis function, Disp = the displacement of the feature from the center of the location constraint, and Tol = the radius of the location constraint sphere for the feature (tolerance). Using fast and best fit procedures of Compare/Fit, the test set was fitted to a CATALYST hypothesis to predict the biological activities. The test set consisted of 12 compounds for the epothilone derivatives and of 10 compounds for the paclitaxel derivatives. The fast fit algorithm identifies the fit of the compound to a proposed hypothesis without minimizing the conformers, whereas the best fit algorithm starts with the fast fit and then minimizes with the CHARMm forcefield. The best fit algorithm, while more time consuming, may sample a larger range of conformational space due to the minimization of the distance between the hypothesis features and the atoms to which they are meant to fit. The predictions obtained from the fast and best fit algorithms were compared via a residual that was computed from the difference (in log units) between the predicted and experimental activities. In general, a predicted activity within 1 log unit of the experimental activity is considered to be a valid prediction.[  ]
CATALYST supports clustering and merging of generated hypotheses. By clustering all of the hypotheses, it is possible to divide the hypotheses into several groups (clusters) based on common features. It can also be used to obtain a new hypothesis by merging the analogous series of hypotheses.
These pharmacophore models may be used to search 3D databases for structurally novel candidate molecules. There seems to be two main objects to screen such databases: first is for validating generated hypotheses, and second is for discovering more potential hit compounds than compounds of training set or test set. Two algorithms can be selected for database search: Fast and Best Flexbile Search algorithms.  The Fast Flexible Search algorithm considers existing conformation precalculated during the search. While on the other, the Best Flexible Search algorithm takes transforming the conformation into account within internal energy threshold of 9.5 kcal/mol as default. Our database searching strategy was performed using the "Best Flexible Search" method that manipulates the conformers to minimize the distances between pharmacophore features and mapped atoms in the model. The lowest cost hypothesis and newly merged hypothesis were used as the query, respectively, against three databases in order to discover and compare epothilone-like hit lists: Maybridge, NCI, and ACD. These 3D databases are supplied with CATALYST version 4.6. In a CATALYST-formatted database, compounds are stored as multiple conformers that are representative of their available conformational space.[,] In our study, ACD, NCI and Maybridge database screening were used for designing new hit lists of microtubule stabilizing agents.
Results and Discussion
The ConFirm and HypoGen modules in CATALYST v. 4.6 were used to develop predictive models correlating biological activity and 3D structures for epothilone and paclitaxel derivatives. These results will be used to construct training sets and test sets and to generate their pharmacophore models.
Pharmacophore Generation for the Epothilone and Paclitaxel Derivatives
To define the pharmacophores that aim to identify the best 3D arrangement of chemical features explaining the activity variation among derivatives, two groups of training set based on epothilone- and paclitaxel moiety were selected. One containing 37 epothilone derivatives and another containing 25 paclitaxel derivatives were prepared as shown in Figures 1-1 and 1-2. The activity values for the two training sets enabled the construction of pharmacophores for these classes of microtubule stabilizing agents. The activities of the compounds in the epothilone training set are IC50 and those of paclitaxel derivatives are ED50/ED50(paclitaxel) and range five and six orders of magnitude, respectively, as shown in Tables 1-1 and 1-2.
After generating all ten hypotheses for epothilone and paclitaxel derivatives, the first hypothesis (H1) was selected as the best one as a result of cost factor analysis. The results of this cost factor analysis are given in Tables 2-1 and 2-2. For the epothilone derivatives, the value of the total cost of the hypothesis generated (H1), the null hypothesis, and the fixed hypothesis were 170.30, 210.33, and 143.50 bits, respectively. The corresponding values for the paclitaxel derivatives were, however, 134.68, 208.61, and 102.40 bits. As a result, the difference between the cost of H1 and Hnull (ï„(Hnull - H1)) and the difference between Hfixed and Hnull (ï„(Hnull - Hfixed)) were 40.03 and 66.83 bits for the epothilone derivatives and 73.93 and 106.21 for the paclitaxels. The predicted activity values along with the experimental values for the paclitaxel derivatives demonstrate better correlation (r=0.91) than that of the epothilone derivatives (r=0.83), as presented in Tables 2-1 and 2-2. Their correlation coefficient (r) measurement supports the assessment of the quality of model. The regressions between experimental and predicted activities are plotted in log-scale in Figures 2-1 and 2-2.
For the epothilone derivatives, the lowest cost hypothesis (H1) possessed five features including three ali-Ps and two HBAs. On the other hand, four features with an ali-P, a HBA, and two HBDs were found in the paclitaxel derivatives as shown in Figures 3-1 and 3-2. Three dimensional coordinates and inter-feature distances of H1 for the epothilone and paclitaxel derivatives are given in Tables 3-1 and 3-2. Three ali-P regions and a second HBA are located at distances of 3.9 Å, 5.2 Å, 7.2 Å, and 4.2 Å from the first HBA in the pharmacophore model derived from epothilone derivatives. Similarly, a HBA, an ali-P region, and a HBD are located at distances of 10.7 Å, 10.0 Å, and 10.1 Å, respectively, from the first HBD region for the paclitaxel derivatives.
As a further verification of the generated pharmacophore models, the parent compound for each class of compounds was fitted to H1. Epothilone B (IC50 = 0.13 nM) and paclitaxel (ED50/ED50(paclitaxel) = 1), typical microtubule stabilizing agents, were individually fitted to each lowest cost hypothesis (H1) on behalf of epothilone and paclitaxel derivatives, as shown in Figures 4-1 and 4-2.
In the pharmacophore model derived from epothilone derivatives, the two HBA regions match the epoxide group of C12-C13, which is not essential for tubulin polymerization but is implicated in cytotoxicity and the ether group between C1 and C15 whose stereochemistry is strongly correlated to its activity.[,  ] Two of the three ali-P regions are located at C8 and C4 (4,4-ethano group) and the third feature is located at C16 (bulky group) in the side chain.
The pharmacophore model for paclitaxel derivatives consists of two HBDs, a HBA, and an ali-P region. Two HBDs are located at the amide group near the C3'-phenyl of the side chain and the hydroxyl group at position C7. The HBA matches the acetyl group at position C10 and an ali-P region occupies position C15 (15, 15-ethano group).
Validation of the Pharmacophore Models using Test Sets
After constructing 3D-QSAR models from the training sets, test sets containing twelve and ten compounds, respectively, for epothilone-[,  ,  ] and paclitaxel-derivatives[  -,  ,  ,  ,  ,  ,  ], were generated and used to predict activity values (Figures 5-1 and 5-2). The methods for generating conformational models of the test set compounds were the same as those used in generating the training sets. The predicted activities for the test set compounds were then compared with experimental activities using the fast and best fit methods in the Compare/Fit module in CATALYST (Tables 4-1 and 4-2).
The validation methods used on the training and test sets can be divided into two categories, as shown in Tables 1 and 4. One uses the differences in activity values (residuals in log units) between experimental and predicted values, as shown in the parentheses in Tables 1 and 4. Predicted activities within 1 log unit can be considered to be accurate to within experimental error for data obtained from the same biological experiments. In the other method, we classified the activity scale of compounds into highly active compounds (ï¼‹ï¼‹ï¼‹, lower than 10 nM in activity), moderately active compounds (ï¼‹ï¼‹, 100 nMï¼žactivity â‰¥ 10nM), and weakly active compounds (ï¼‹, more than 100 nM).[  ,  ] In this study, we consider a classification method using the residual of the difference in the activity values as having greater weight than a method directly using an activity scale because we have identified some flaws in the activity scaling method. For example, in the activity scaling method, five compounds in the training set of paclitaxel derivatives (13th, 14th, 15th, 20th, and 22nd compounds) were judged to be beyond the active range by a simple classification (Table 1-2) even though they are within 1 log unit.
The most highly active values in the test set were predicted to have residuals smaller than 1 log unit. We obtained this result using the fast fit method for the epothilone and paclitaxel derivatives. According to this result, ten of twelve epothilone derivatives (83%) and eight of ten paclitaxel derivatives (80%) had residuals smaller than 1 log unit, as shown in Tables 4-1 and 4-2. The plots of correlation of predicted versus experimental activity for the training and test sets using the fast fit method are given in Figures 6-1 and 6-2. In contrast, fitting using the best fit method was very poor for eight out of twelve epothilone derivatives and for seven out of ten of the paclitaxel derivatives. It seems to be caused by the fact that the energy of the conformer used in the fitting with the best fit method is higher than that for the conformer used in the fast fit method because best fit starts from the fast fit conformation and then searches for a conformation with a better fit value (See Materials and Methods).[,  ]
The training sets used for building the pharmacophores generally covered a larger range of activity values (-1.3 to 2.5 log units for the epothilone derivatives and -2.1 to 2.9 for the paclitaxel derivatives) than that of test sets (0.1 to 2.5 log units for epothilone and -0.5 to 1.8 for paclitaxel derivatives). It was illustrated that the prediction of activity value for the test sets was an interpolation from the compounds in the training sets. Based on this judgment, it may be useful to appreciate how well the compound is able to extrapolate values beyond the range of training set. Through reiterative revision of the established pharmacophore model by adding a high quality test set of compounds to the training set, better predicted activity values should be obtained from the new models.
Complementary Pharmacophores using Clustering and Merging Methods
After generation of pharmacophore models from epothilone derivatives, ten hypotheses were grouped with the case of two clusters, in which Hypo 1 and Hypo 3 that are two good quality hypotheses in each cluster were merged with a 0.8 Å distance tolerance (Table 5 and Figure 7). The lowest cost hypothesis (Hypo 1) has five features with two HBAs and three ali-P regions, while the Hypo 3 has four features with two HBDs, one HBA, and one ali-P region. As a result of analysis of the distance tolerance between features of two hypotheses, it was demonstrated that these newly merged hypothesis contains two HBAs, two HBDs, and three ali-Ps (Figure 7), and ali-P3 region was proved to exist in almost same 3D spatial coordinate of the two pharmacophores. Clustering and merging may be used in the following cases: (1) in order to make a more selective model when it is used to search small molecule databases and (2) in order to generate a model that is more detailed.
3D Database Search
To find out new hit compounds for epothilone-like microtubule stabilizing agent, lowest cost hypothesis (H1) and newly merged hypothesis (Hm) were used as the query against three databases: Catalyst/Maybridge2004 (about 55,000 compounds), Catalyst/NCI2000 (about 239,000 compounds), and Catalyst/ACD2001 (about 267,000 compounds). In order to create hit compounds for epothilone-like microtubule stabilizing agent, first, the shape based query (a merged shape and hypothesis query) method for H1 was used because this search should give lower percentage of false positives than using generated pharmacophore only. The bound conformation of epothilone-B was used to create the shape query. Second, merged hypothesis (Hm) with clustering and merging function of Catalyst program was considered as another query of 3D databases to find out more elevated hit compounds for microtubule stabilizing agent.
3D database for H1 was searched using the "Best Flexible Search Databases/Spreadsheets" option to obtain as many hit as possible, and resulting in 159 hits total for Maybridge (0.29% of the whole Maybridge), 626 for NCI (0.26%), and ### for ACD database (&&%). On the other hand, total of ## hit lists for Maybridge (##% of the whole Maybridge), ## for NCI (###%), and ## for ACD (###%) were produced for merged hypothesis (Hm) query. The search results are shown in Table 6. To exclude those hit compounds that are unlikely to be epothilone-like microtubule stabilizing agent because of their high molecular weight, the hit lists were limited less than a 600 g/mol molecular weight cutoff and number of rotatable bonds was restricted less than 15 based on molecular weight (507.68) and number of rotatable bond (11) of epothilone B compound. The final hit lists were considered those with greater than 6.50 of fit value based on the lowest fit value in compounds of training set for lowest cost hypothesis in the Table 1-1 (fit values range from 6.69 to 10.56). For example, finally selected hit lists for lowest cost hypothesis from Maybridge2004 produce 30 compounds with fit value greater than 6.50, while ## compounds for newly merged hypothesis.
Out of 239,000 compounds existed in the NCI, 261 hit lists were extracted with cutoff of molecular weight and number of rotatable bonds (0.11% of the whole NCI). Among these, in consideration of their high fit values, xxx identified structures were selected for testing of their microtubule stabilizing agents.
In summary, the 3D-QSAR method may be applied for elucidating structurally important groups for binding in the active site of a receptor. This study may also be applied to the case of other 3D- or 4D-QSAR methods that require more structural information, such as, descriptor-based inhibitor modeling.[  ,  ] Generated or merged pharmacophore models can be useful for database searching. Designed candidate ligands can be used for testing proposed binding site models.
A number of methods such as CoMFA[,  ] or WHIM (Wheighted Holistic Invariant Molecular descriptor)[,] and many publications were available for carrying out 3D-QSAR studies of ligands or inhibitors against receptors including microtubules  , but the pharmacophore modeling of microtubule stabilizing agents using 3D-QSAR/CATALYST was new attempt. This approach makes it possible to structurally define unique properties of the active site in an unknown receptor by generating pharmacophore models from ligands and their activities.