Data Mining Applications In Anticancer Drug Discovery Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Cancer forces a serious burden on the public health and creates a challenge to science. Though the century-long drift of increasing cancer mortality in the world was reversed in the middle of 19th century, cancer remains the second leading cause of death. In addition, cancer presents a rationally complex set of difficulties because of multiple sites and causation, inefficiently understood biology, and numberless intervention approaches. Notable progress has been made against cancer, but not solely because of new findings about its genetics, molecular biology and novel therapeutic approaches. As the mentioned, the discovery of novel drugs to treat cancer is an extended and hard process with a very high level of abrasion. Many steps in this lengthy procedure use data generated from various species. One key challenge is to successfully translate the basic findings of target validation and safety studies in clinical trial stage. Advanced computational evolutionary analysis methods combined with the increasing accessibility of sequence data enable the appliance of systematic evolutionary approaches to targets and pathways of attention to drug discovery. Data mining methodology as the one of cheminformatics tools is applicable in drug discovery process to analyze related data from many different sources, classifying it, and summarize the relationships identified. In current review, we discussed about data mining applications in drug discovery process to treat cancer and help to solving related challenges.


Cancer, heart failure and stroke are among the most common causes of death worldwide. Cancer is the second most common cause of death following the cardiovascular diseases. According to the World Health Organization (WHO), more than 10 million people are diagnosed with cancer yearly. By 2020, the world population is expected to have risen to 7.5 billion; of this number, around 15 million new cancer cases will be diagnosed, and 12 million cancer patients will die [1]. Pointed undesirable data demonstrate cancer is described as a serious challenge in men care and survival. Although we have witnessed the development of many drugs against cancer, the death rates for the most prevalent cancers have not been decreased [2].

The high throughput data collectively referred to as 'OMICs' data are ubiquitous during the drug discovery process from target identification and validation to the development and testing of novel anticancer drug candidates to solving cancer treatment challenges . These important recent technical advances and discoveries are not without limitations and challenges (omics). The generated data are technically and statistically complex; therefore, computational approaches as bio-chem informatics methods have been developed and adapted to facilitate the processing and analysis of large amounts of resulted data [3].

Data mining methodology as one of the most applicable cheminformatics tools have been appeared to define associations in many types of databases. These methods have been employed to classes, clusters, associates and patterns of raw achieved data resulted to designing and discovering effective target with high chance of success in clinical trial stages Fig. (1). Data mining methods applied in drug discovery process generally include artificial neural networks, Bayesian probability approaches, and genetic algorithms, decision trees, nearest neighbor methods, rule induction, new data visualization and virtual screening techniques [4].

In fact, data mining is the procedure of analyzing data from different perspectives and summarizing it into practical knowledge. In recent, the worldwide of researches in drug discovery fields are focused on applying data mining approaches to classes, clusters, associates and patterns of achieved data resulted to design and discover effective target with high chance of success in clinical trial stages [4].

In current review, we present data mining applications in drug discovery procedure consist of virtual screening and data visualization methods. Indeed, as the broad ranges of anticancer drugs (over of 60%) used in cancer chemotherapy have been derived from natural source [5], related novel approaches in natural anticancer drug discovery process, have been described.


Presently large quantities of genomics, proteomics, metabolomics and pharmacogenomics data are being engendered both in academia and industry. Gene expression altered in response to a drug or a toxin is usually measures by microarrays. Changes in global gene expression patterns in animals or cells at multiple dose levels and time-points produce related "signature" genes that can be used as predictors or biomarkers in humans. The assessment of the effect of a compound on protein activity and concentration level can be very different yet complimentary to gene expression data and more consistent with the overall mechanism of toxicity. Proteomics deals with quantitative and qualitative measurement of protein concentration and/or expression in whole-tissue samples. This is significant because presence of a mature mRNA transcript is insufficient for having a corresponding active protein due to post-translational modifications, proteolysis and other dynamic processes causing functional changes. A comprehensive analysis of biological systems needs the integration of all biological data that is generated to discover molecular biomarkers. The relation of experimental data and large amounts of literature data on transporters, enzymes, channels and receptors that bind small molecules may need to be interpreted as a network of interactions enabled by expansions in databases and data annotation to eventually reflect the response of the whole system, as well as provide insight into the functional organization of the cell [6].

In combination with whole-genome sequences of human and numerous model organisms, the availability of technologies to change and measure cellular responses at the level of personality transcripts and proteins presents opportunities to accelerate the process of drug discovery across the entire channel, from disease understanding and target identification through clinical trials, postmarketing surveillance and diagnostics. Multiple novel technologies have recently been developed to improve the analysis of genetic sequences, to rapidly assess RNA or protein levels in relevant tissues, and to validate function of potential new drug targets. The challenge facing pharmaceutical research is one of effective integration of these new technologies in ways that can maximally affect the discovery and development pipeline. Database mining clearly have increased the number of putative targets [7-8].


Scientists in the 21st century have spectatored an detonation of genetic information following the completion of the Human Genome Project. Many genome-based techniques have been applied to document a plethora of potential new drug targets and diagnostic markers and this trend looks likely to continue. In the future, patients could be offered personalized medicines based on their personal genetic composition; however, this notion comes with many practical issues, which will be debated in the next decade [9].

High-throughput chemogenomics methods have proven to be potent tools for resolving complex genetic factors included in chemosensitivity. This is expected to accelerate the drug discovery process because each factor could offer a novel drug target. DNA, protein and tissue microarrays all need to be used in combination with drug response phenotype, such as in vitro toxicity or clinical outcome in cancer patients. In addition, genotyping of polymorphisms in candidate genes has promise for refining drug target corroboration. This should aid the identification of novel genetic factors that are predictive of chemotherapy response and molecular targets for drug discovery. Though, despite the promise of these technologies, there are several obstacles that need to be overcome. For instance, the sensitivity and accuracy of the analytical methods must improve dramatically when searching for candidate genes. These methods must be standardized to allow comparisons across studies. Moreover, relationships established during correlation analysis between drug response phenotype and genetic variations are correlative, not fundamental. Methods of candidate validation must be improved, with particular concentration paid to the microenvironment of the cancer tissue in vivo [10].

With the perception that gene networks rather than personality genes determine chemoresistance, genomics will have an significant role in efforts to unravel how the transcriptome and genome of a tumour cell manipulate its sensitivity to chemotherapy. Genomics investigations cannot be considered the endpoint in research on tumour drug resistance. Such strategies permit the formulation of hypotheses about the relative between drug sensitivity and transcriptomic and genomics variation at the level of correlation rather than cause and effect. Ascertainment of biological function requires candidate gene validation via conventional molecular biological advances. The challenge of functional genomics will be to clarify how validated candidates act and interact as components of complex gene regulatory networks that conclude drug sensitivity in tumours [11].

The NCI Developmental Therapeutics Program Human Tumor cell line data set is a publicly available database that encloses Nearly 100,000 chemical compounds have been tested for cytotoxicity in this panel using a 2-day assay to explore the GI50 in each cell line, with more compounds being tested continually. In addition to discovery of potential antineoplastic drugs, these 60 cell lines are also being characterized on a molecular level to describe alterations in genes that may contribute to carcinogenesis. The database also contains microarray assay gene expression data for the cell lines, and so it provides an admirable data resource particularly for testing data mining methods that bridge chemical, biological, and genomics information. Appropriate data mining of such databases may in turn aid in the expansion of compounds with exact cytotoxicity directed against cancer cells with particular molecular characteristics [12-15].

For the analysis and exhibit of the parts of these large databases involved [i] anticancer activity data for compounds across the 60 human tumor cell lines; [ii] chemical structure data for the tested compounds; and [iii] data on probable targets or modulators of activity in the 60 cell lines, the discovery program set, which maps coherent patterns in the data rather than treating the compounds and targets one pair at a time was developed. The antitumor activity models of 112 ellipticine analogues using a hierarchical clustering algorithm were analyzed, dramatic coherence between molecular structures and activity models was monitored qualitatively from the cluster tree [16].

In the first study to incorporate huge databases on gene expression and molecular pharmacology, cDNA microarrays were employed to assess gene expression profiles in 60 human cancer cell lines used in a drug discovery screen by the National Cancer Institute. These data applied to correlate gene expression and drug activity patterns in the NCI60 lines. Clustering the cell lines on the basis of gene expression yielded relationships are very diverse from those obtained by clustering the cell lines on the basis of their response to drugs. Resulted data cleared that gene-drug relationships for the clinical agents 5-fluorouracil and L-asparaginase exemplify how differences in the transcript levels of particular genes relate to mechanisms of drug sensitivity and resistance. The main restraint of the pointed study is that pharmacologically interesting behaviors are not constantly reflected at the transcriptional level. It will be essential to assess differences among cells at the DNA and protein levels as well. To achieving this aim, DNA and protein in parallel with the RNA for cross-indexable characterizations with respect to all three types of molecules should be collected to combine these three levels of experiment and analysis [17].

In the other study, a sensitive and reproducible method of measuring mRNA expression was used to compare basal levels of 10 transcripts in the 60 cell lines of the National Cancer Institute's in vitro anticancer drug screen (NCIACDS) under conditions of exponential growth. The mined and analyzed data showed that BCL-X may play a unique role in common resistance to cytotoxic agents, with the cell lines demonstrating virtual resistance to 70,000 cytotoxic agents in the NCI-ACDS being characterized by high BCL-X expression [18].

In considered direction, a novel method was developed to discover local associations within NCI60 human cancer cell lines to determine putative functional relationships between gene expression patterns and drug activity patterns of cell lines subsets. The relationship of drug-gene pairs is an explorative way of discovering gene markers that predict clinical tumor sensitivity to therapy. Nine Drug-gene networks were discovered, in addition to dozens of gene-gene and drug-drug networks. Three drug-gene networks with well studied members were conferred and the literature demonstrates that hypothetical functional relationships exist. Thus, this method enables the gathering of new data beyond global associations [19].

As the pointed, genomics and postgenomics data hold great promise for contributing to efficient drug discovery. Though, the availability of genomics and applied genomics platform data, in addition to large-scale genetic association studies, high-throughput screening data and outcomes from animal models, creates a sizeable challenge for data integration and knowledge management. Fulfilling that promise requires concentration first at the level of basic data management, indexing, standardization of descriptors and data normalization that will support valid inferences especially across platforms. This in turn will better facilitate the well-founded integration of 'omic' data, during a total biological context but in particular with chemical, genetic and clinical data that will most directly support the needs of the drug discovery enterprise [8].


The study of the proteome as it relates to drug discovery is mainly eagerly awaited. Proteins represent the point of interaction with small-molecule drugs and reflect greatly more data than is apparent in gene sequence. Understanding the three-dimensional structure and associated biological function, post-translational modifications and networks of interacting components are some of the challenges to be faced in the future [7].

Some proteomics technologies, involving 2D-PAGE, MALDI TOF, LC-MS/MS, MudPIT, SELDI-TOF and protein microarrays are applicable tools for cancer treatment researches based up proteome studies. Though current results are promising, the major challenge ahead remains the integration of the various areas of proteomics, such as imaging mass spectrometry, protein-protein interaction mapping and quantitative protein expression profiling, so as to formulate hypotheses on the mechanisms of cancer development. These studies must happen in concert with multicenter clinical investigations on a well-matched patient population before they can be of potential diagnostic, prognostic or therapeutic value. Once all these technological platforms have been integrated and refined, proteomics research will be a driving force for rapid improvement within the field of basic research, cancer diagnosis and prognosis, cancer therapy and new drug discovery [20].

The ability to discern the full complement of proteins to which a drug will bind and potentially exert its effect is central to researcher ability to expand a clear understanding of the intimate relationship between compound structure, target affinity and biological response. Even though a broad diversity of approaches, this is the ultimate aim of many, if not all, chemoproteomics technologies. Note that this advance to drug discovery is distinct from the traditional drug discovery process, in which the first step is target selection. In proteome mining target selection does not take place until after the screen is completed because it allows for hundreds of proteins to be screened simultaneously [21].

As the cellular receptors and kinases are two main groups of cellular proteome involved in cancer mechanism identification as well as anticancer drug targeting, the researchers studied and analyzed the related kinome and receptorome. The human kinome ismade up of 518 distinctive serine/threonine and tyrosine kinases which are key components of virtually every mammalian signal transduction pathway. Thus, kinases provide a compelling target family for the advance of small molecule inhibitors, which could be used as tools to delineate the mechanism of action for biological processes and potentially be applied as therapeutics to treat cancer. The data generated from large-scale profiling analyzes have led to the development of various informatics methods to support in the visualization and extraction of meaningful data. Clustering of compounds and/or kinases based on different similarity measures, together with potent visualization programs [22].

The receptorome, comprising at least 5% of the human genome, encodes receptors that arbitrate the physiological, pathological and therapeutic responses to a vast number of exogenous and endogenous ligands. Receptorome screening provides an unbiased and highly efficient approach for molecular target discovery and validation. Development of novel screening technologies and improved Chemoinformatics resources will greatly improve researcher capability to mine the receptorome for therapeutic drug discovery [23].

As the main challenges in cancer drug development are discriminate responses, efficacy and toxic side effects, the pharmaceutical industry, drug policy makers and administrators are regularly looking for novel pharmocoproteomics studies that might recognize potential molecular biomarkers to help solve these problems [24]. Most therapeutic agents were extended without the data of their molecular target. This has led to expensive progress and production of cancer drugs because of a lack of data on targets, which can be applied to test the efficacy of therapeutics [25]. To amplify the efficiency and quality of drug discovery, biomarkers could be employed. Biomarkers can be useful for in vitro evaluations of hundreds of candidates that are typically screened during the drug development process. Biomarkers can furthermore be used in measuring drug toxicity and pharmacokinetics in Phase II clinical trials. Incidentally, the limited number of useful markers has propelled investigators to use high throughput platforms, as well protein array and antibody arrays, and other approaches to discover large numbers of candidate biomarkers. The reason for using high throughput technologies is that they present a large number of correlative data on protein expression in relation to disease. Such data are then analyzed for their association to the disease. The assumption is that multiple changeable will be able to provide data on associations more accurately than a single marker. Such potent associations provide main impetus for the molecular profiling approaches to find patterns or profiles for a clinical assay based on high dimensional gene or protein expression panels [26].

By general reviewing investigations performed in (omic) areas and microarray analytical methods, it is cleared that they have a major impact on the development of cancer therapeutics. The most related obvious benefits include enhancing understanding of the global regulatory networks of normal and tumour cells as well as the improving classification of human cancers and increasing capability to predict the outcome of treatment for individual patients. Related to this, gene expression profiling of cancer versus normal tissues has now become a standard approach to the detection and validation of new molecular targets for therapeutic intervention.

As new drugs are developed, gene expression profiling is increasingly applied to explore mechanism of action and to find out on-target versus off-target effects. Transcriptional profiling is being used to advance lead optimization and to characterize clinical development candidates. In addition, there are already numerous examples of the use of microarrays to determine global genome expression changes induced in cancer tissues by drug treatment in cancer patients. As the technology matures, there is command for greater sensitivity, reproducibility, robustness and user-friendliness [27].

Bio-cheminformatics is increasingly seen by the user community as the critical bottleneck. Algorithms that integrate measures of statistical confidence are broadly available and improved platforms for sharing and storing microarray data are being developed. Validation of observations by alternative technologies remains vital, but as the robustness and reproducibility of expression profiles continues to improve, it is increasingly likely that these molecular signatures will be accepted as stand-alone measures of biological function and pharmacological potent. Evolution to this goal will be helped by the development of guidelines for microarray analysis, including reproducibility and statistical aspects [27].


Cancer research has undergone radical changes in the past few years. Producing data both at the basic and clinical levels is no longer the issue. A novel cross-disciplinary advance has been entranced which encompasses some tightly incorporated disciplines: biomathematics and computation, cancer biology, bioengineering and imaging. This approach undertakes to advance our understanding of the requisite parameters and processes in efficient anticancer drug discovery [28].

In the pre-automation era drug discovery anticancer compounds were synthesized and tested manually in small amounts. Then the introduction and development of combinatorial chemistry and high-throughput screening revolutionized drug discovery by allowing great number of chemical compounds to be synthesized and screened in short periods of time. Though, this massive growth in the number of compounds screened did not create the expected increase in the number of successfully launched novel drugs [29].

The pharmaceutical industry is rapidly adopting virtual screening techniques aimed at identifying novel anticancer chemical compounds that have the required ingredients to become successful drugs. The need for a high-throughput yet inexpensive evaluation of the molecules in silico, before they are tested or even made, is necessitated by the increasing costs of drug discovery and the current 'drought' in the novel drug approvals. The computational filtering stage is especially vital for combinatorial chemistry, where billions of compounds can be synthesized from the commodity reagents. Neural networks have a verified ability to model complex relationships between pharmaceutically relevant properties and chemical structures of compounds, and have the potential to improve diversity and quality of virtual screening [29].

As the pointed, virtual screening is an increasingly significant component of the computer-based search for novel lead compounds. There are, essentially, two kind of virtual screening methods: 'virtual screening by docking', which involves knowledge of the 3D structure of the target protein binding site to prioritize compounds by their possibility to bind to the protein; and 'similarity-based virtual screening', where no data on the protein is necessary, one or more compounds that are known to bind to the protein are employed as a structural query. The screening procedure extracts compounds from the database based on an appropriate similarity criterion. In order for the screening process to be effective, this criterion should regard molecules that bind tightly to the same proteins as similar [30].

Virtual ligand screening (VL)] based on high-throughput flexible docking is an promising technology for rational lead discovery according to receptor structure. Rapid accumulation of high-resolution three-dimensional structures, further accelerated by the structural proteomics initiative and the developments of docking and scoring knowledge, are making VLS an attractive alternative to the traditional methods of lead discovery. VLS can model a virtually infinite chemical diversity of drug-like molecules without synthesizing and experimentally testing every screened molecule. Usually, a corporate high-throughput screening (HTS) set compound library ranges from 200,000 to 1,000,000 molecules. Even with shared libraries as large as these, though, the experimental HTS often does not outcome in viable leads. The high cost of such massive experimental testing and its technical complexity are more motivation for the theoretical alternative. In conclusion, the virtual experiment, as opposed to a high-throughput assay, can be easily designed to select for a particular binding site or receptor specificity [31-34].

Most of the accessible established flexible docking algorithms, for example DOCK, ICM, FlexX, QXP, Ecepp/Prodock, Pro_LEADS, Hammerhead, FLOG, GOLD, LUDI, AutoDock and GREEN have been developing for years. Most scientists continue to improve their core docking processes or extend additional protocols on top of them to answer a specific question. Virtual ligand docking and screening can be used to selections of individual chemically accessible lead candidates, selection of the side chains for a given scaffold, or collection of the scaffolds. The number of hit stories from VLS is growing quickly [35].

As the sample, Three different database docking programs (Dock, FlexX, Gold) have been used in combination with seven scoring functions (Chemscore, Dock, FlexX, Fresno, Gold, Pmf, Score) to charge the correctness of virtual screening methods against two protein targets (thymidine kinase, estrogen receptor) of identified three-dimensional structure. In constrict, the resulted data proposed a two-step protocol for screening large databases: [I] screening of a reduced dataset contained a few known ligands for deriving the optimal docking scoring method, [II] applying the latter parameters to screening of the complete database [36].

Several investigations applied virtual screening by docking showed that VS methods support the decision-making procedure in drug discovery by the assessment of large virtual libraries and in silico compound filtering. Automated docking processes have been successfully applied to database screening, de novo design and the analysis of binding modes of individual molecules. Library design by VS and automated combinatorial docking by now plays a significant function in current drug discovery projects, and it is anticipate that they will become an indispensable part of future medicinal chemistry [37].

Chemical similarity searching for the in silico target fishing is a good first pass, but it is presently not systematic in how targets are ranked and it does not generally incorporate target class data. Models built with machine learning methods have the ability to rapidly and automatically predict targets. Target prediction by connecting structures to biological activity spectra is a confirmed approach to perform respected analysis of experimental data [38].

With the increasing importance of high-throughput chemistry and screening, the consequent increase in data volume requires more effective methods to visualize and structure the data produced in research. Structure-activity relationships (SARs) and Quantitative SARs (QSARs), typically using physico-chemical parameters or three-dimensional molecular fields with statistical methods such as multiple regression or principal-components analysis, have been an key tool for medicinal researcher to visualize and structure the data [39].

At the core of the novel approach is the approval of computational technologies and wide adoption of in silico methods to help similarity-based virtual screening of compounds for desired biochemical properties before these compounds are tested, acquired or even made. Similarity-based virtual screening is the mostly used term to describe a diversity of computational methods used to estimate biologically relevant properties of compounds. Most common functions of similarity-based virtual screening involve prediction of biological activity. The previous techniques frequently apply QSAR models generated using different statistical and machine-learning methods [29].

One of the main techniques applied in similarity-based virtual screening is Artificial Neural Networks (ANNs) with perceived ability to mimic activities of the human brain, albeit in a simplistic way. The two types of artificial neural networks most often used in pharmaceutical sciences are three-layer feed-forward neural networks and two-dimensional self-organizing maps (SOM)/ Kohonen networks. They have been routinely applied to solve problems that are beyond the current theoretical knowledge with too complexly in model using linear statistical techniques and require high throughput predictions. While the in silico screening of chemical compounds becomes a standard practice in drug discovery, neural networks will use to classify structures that have the respected activity, the right pharmacokinetic profile and the low probability of toxic patents. In the area of screening of virtual combinatorial libraries, ANNs can have an important contribution by enabling the exhaustive screening of combinatorial libraries of virtually any size. This can be completed by circumventing expensive steps of inventory and characterization of the individual combinatorial products and by computing their descriptors or even properties of interest using combinatorial neural networks [29].

Reviewing respected explorations have been cleared which structure-based drug design has become increasingly important for anticancer drug lead discovery and optimization.

The most central causes of late-stage failures in drug development are that the compound has poor pharmacokinetic property or is toxic. It is impractical and very precious to execute standard ADMET (absorption, distribution, metabolism, excretion and toxicity) studies on many anticancer compounds during the lead optimization stage. Hence, there is a need to develop alternative approaches to recognize and eliminate early on toxic compounds or compounds with poor pharmacokinetic patents. In silico ADMET tools have been expanded in an effort to reach this need. Various computational methods and modeling methods based on data mining techniques such as partial least square (PLS), neural networks (NN), Bayesian neural nets (BNN), genetic programming (GP), and semi-empirical molecular orbital methods have been employed. In the pharmaceutical industry, the drug-likeness concepts and metrics have now been incorporated into their anticancer drug discovery plans. It is respected to design (informed) virtual libraries that contain synthesizable and drug-like compounds. Many pharmaceutical companies currently prefer to model smaller but focused virtual libraries using more sophisticated computational methods to achieve proficient anticancer drug with respected properties and minimum undesirable side effects [40].


Perhaps more astonishing is that more than 60% of the anticancer and 70% of the anti-infective antibiotics nowadays in clinical apply are natural products or natural product-based [5].

The majority large pharmaceutical companies have reduced the screening of natural products for drug discovery in favor of synthetic compound libraries. Major reasons for this involve the incompatibility of natural product libraries with high-throughput screening and the marginal improvement in applied knowledge for natural product screening in the late 1980s and early 1990s. In recent times, the expansion of novel knowledge and methods has developed the screening of natural products. Being relevant these technologies compensates for the innate limitations of natural products and proffers a unique opportunity to re-establish natural products as a main source for drug discovery [41].

Natural products are small molecules found in varied natural sources which play important roles in cancer treatment and are gaining increased applications in drug discovery and development.

Organisms chemically diverse are capable to alter several targets simultaneously in a complex system. Analysis of gene expression develops into necessary for better understanding of molecular mechanisms. Conformist planes for expression profiling are optimized for single gene analysis. DNA microarrays are employed as suitable high throughput tool for simultaneous analysis of multiple genes. Recently, the challenge is supplying standardized, sensitive, and reproducible microarray platforms, databases and visualization methods for expression profiles as well affordable to scientists. With the development of novel, identical and more sophisticated experimental designs, data management systems, statistical tools and algorithms for data analysis DNA microarrays can be optimally applied in herbal drug research. In spite of the huge potential offered by microarray technology, the significance of in vitro biological assays, cell-line studies and in vivo animal studies cannot be ignored. A comprehensive policy incorporating data from various scientific experiments and technologies will direct to molecular evidence-based herbal medicine [42-43].

Near the beginning times of combinatorial chemistry experienced from an excess of hype, and a main victim was natural-product screening. Numerous organizations went through a permanent shift in policy, and prematurely discontinued their attempts in this area. As combinatorial chemistry matures, imperative and sophisticated design strategies have developed based on natural products. Synthesis of analogues intimately related to a natural product. Such compounds are extremely biased towards an exact target, and respected modern methods for parallel organic synthesis are sufficiently commanding to relate this approach to fairly complex natural product leads traditionally reduced by medicinal chemists. Midway are strategies that obtain benefit of the tremendous scaffold variety present in nature. Here, combinatorial examples have previously uncovered molecules with biological properties beyond and probably unrelated to those of the initial natural product considered. Finally, at the other end of the spectrum lies the edifice of completely synthetic molecules which are intrinsically natural-product-like. Currently, these chemistry-driven initiatives are mainly the province of academia. In the future, it will be motivating to distinguish if such diversity-oriented syntheses are adopted by big pharmaceutical and commercial suppliers of compound libraries, in an attempt to improve presently unfilled chemical space in HTS collections [44].

In the recent researches, virtual parallel screening results in a pharmacophoric profile for natural compound screened. Based on this pharmacophoric profile, a predicted bioactivity profile can be extrapolated. This multitarget computational advance is practical in prioritizing targets for experimental investigations. It presents the possibility to increase an insight of the supposed interactions to targets of clinical significance as well as to those targets which cause undesirable side effects. The predicted binding to these so-called antitargets permits an inference of risk evaluation at an early stage of drug discovery. Consequently, virtual parallel screening methodology has the capacity to catalyze drug discovery intensely for all of those diseases where molecular targets or molecular ligands are well defined to generate consistent computer-assisted tools. The in silico pharmacological paradigm is continuing in natural product sciences. Based on the first application scenarios, virtual parallel screening promises to have a substantial impact for the discovery of novel bioactivities and profiling of natural compounds in a target-oriented means [45].

Finally, it is notable that drug discovery process from natural source has been appraised to take 10 years averagely and cost more than 800 million dollars. This long and costly process encloses four main stages: [I] Lead identification, [II] lead optimization [involving medicinal and combinatorial chemistry], [III] lead development [including pharmacology, toxicology, pharmacokinetics, ADME (absorption, distribution, metabolism, and excretion, and drug delivery), and [IV] clinical trials. Lead identification is the first stage in a lengthy drug discovery process from natural source and the base of successful medicinal plant discovery. Target-based bioassays, cell-based bioassays and in vivo bioassays are categorized in lead identification stage [46]. In these regards, as the recent investigations, a data mining novel approach in combining cheminformatics, intensive literature handling were used together with correlation of biologic data to search for the desired biologic activity in the domain of natural products that were not explored before. Mentioned investigations can play efficient role to providence time and cost in lead identification stage. Results yield of cheminformatics based lead selection confirmed by cell-based bioassays confirm that such methodology can successfully be used in the area of anticancer lead discovery [47].