Human Disease Insight (HDI) Database Development

Motivation: The scope of the Human Disease Insight (HDI) will not be limited to researchers or doctors but also provide basic information to common people and creating awareness among them and thereby reducing the chances of suffering due to ignorance. The integrated bioinformatics tools within the database will enable researchers to perform comparison among the disease specific genes, and perform protein analysis, search for biomarkers and identification of potential vaccine candidates. Eventually the tools would be of great help to analyze facts about the diseases.

Results: The HDI is a knowledge based resource for human disease information to both scientists as well as general public. Here, our mission is to provide a comprehensive human disease database containing most of the useful information with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases, drugs and genes involved in various diseases. In addition, the HDI contains well classified bioinformatics tool with description. HDI provides two types of search capabilities, and has provision for downloading, uploading and searching disease/gene/drug related information. Logistics designed for HDI allow regular updation of the database.

Availability and implementation: The HDI is freely accessible at http://humandiseaseinsight.com, with user-friendly web interface, and is highly useful to the physicians, researchers, patient and general public.

Keywords: Database, Knowledge Management System, Relational Database Management System, Three-Tier Architecture, Web-Server, Mysql, Disease, Gene, Drug

1. Introduction

Scientists have documented diseases pertaining to a specific category in various online databases. Due to advancement in science and technology, especially genomics and information technology, we have entered in an exciting era of modern biology. The major challenge, that medical science community presently facing, is the integration of vast and rapidly growing volumes of information on various diseases into a holistic understanding. Recent progress of disease genetics and genome-related medicine has been considerable, with extensive data being generated. The remarkable approach of the Human Genome Project in identification of most of the human genome, transcriptome and proteome, and making them publicly available through online databases, assisted in in-depth inspection of disease genetics.

At present databases containing information about human diseases are focused predominantly on a particular category: all known Mandelian disorders (Hamosh, et al., 2005), infectious disease database, rare children diseases (http://www.madisonsfoundation.org/index.php), hereditary ocular disease (https://disorders.eyes.arizona.edu), dermatological diseases (http://www.aocd.org/) and gastrointestinal diseases (http://www.gastro.net.au/). Such attempts enormously uplift the efforts related to prevention, diagnosis and treatment of diseases, developing new approaches to alleviate the consequences of life threatening diseases. However, till date no disease database is populated with guidance towards bioinformatics tools and information available for common individual. Integration of all human diseases from different categories at a common place has become an important issue in the Bioinformatics.

Over a period of time, amendments in diagnostic evaluation and treatment emerges. In order to facilitate the community with the latest knowledge of human diseases and discovery of gene involved in diseases, we have created a Knowledge Management System (KMS), that includes information of various categories of human diseases, drugs used to cure the diseases, genes involved in causing the diseases and bioinformatics tools to analyze the involved gene. HDI is thus a comprehensive database of human diseases classified in various categories and cross linked to other databases to retrieve a detailed knowledge of genes, drugs and tools. HDI has broader utility in that it renders clinical information for physicians, genetic information and tools classification for researchers and general description of disease for general public.


Human Disease Insight (HDI) introduces an integrated knowledgebase of diseases, genes, drugs and bioinformatics tools list, with a user friendly interface. It is designed to assemble, store, organize and display information about human diseases, genes associated with human diseases and drugs used to cure diseases in conjunction with classified list of bioinformatics tools for sequence analysis and structure modeling of genes/protein. HDI currently includes information about 625 human diseases, 320 drugs, 1440 gene and classified list of bioinformatics tools (Table 1). Diseases have been classified into 12 categories, each category has been populated with disease information that includes - synonym/s, pathogen, general description of disease, gene, clinical features, pathways, investigations, prevention, treatment, risk factor, prevalence and references (Kanehisa and Goto, 2000), (http://www.nlm.nih.gov/medlineplus/), (http://www.medscape.com/). Drugs have been classified into 26 broad categories. Assignment of the genes to human diseases are enlisted with links to NCBI (Maglott, et al., 2007) and UniProt (Wu, et al., 2006) for detailed information. Bioinformatics tools are broadly classified into 3 main categories, each category is then categorized into further sub categories. Information collected for disease, drugs and genes are interconnected in such a way that through disease option, multiple genes and/or multiple drugs involved in a particular disease can be retrieved, through drugs option number of diseases where a particular drug can be used is retrieved and through gene option number of disease/s where a particular gene is involved can be displayed. These information can be accessed freely. The information is curated and updated regularly.

3. Database Structure

HDI, is a knowledge based data ware house, that provides an integrated and curated repository of human diseases, drugs and reported genes involved in the pathogenesis of the disease along with the links to bioinformatics tools. Classification of bioinformatics tools with description and links to their respective web pages assists in performing research analysis of gene/protein sequence/s. HDI endows user friendly web interface to allow user to retrieve, download and upload information through interactive web forms. The schematic representation of the logistics used in HDI is shown in Figure1.

3.1. Software design and implementation

The data ware house HDI is developed and implemented on a three-tier architecture-user/client, web-interface and relational database management system (RDBMS) backend. User/client can be a physician, researcher, student and/or general public. The web interface is comprised of web pages and web forms, designed in HTML5, CSS, PHP, javascript, ajax, jquery and MySql queries, to provide common gateway interface. At the backend we have created data marts of various information pertaining to human diseases. This developed database is dynamically constructed, web pages and web forms are interlinked with the data ware house created at the backend, for querying the database as instructed by the end user through button clicks and drop down menus. The data ware house created at the backend is a relational database, managed with MySql developed on Windows operating system. For web services, Apache HTTP web server was used. Data mining was performed to retrieve information for human diseases, genes, drug and tools through various web resources and text books, obtained data was then subjected to curation and uploaded to the database.

Framework for HDI primarily consists of tables for disease, drugs and genes information including bioinformatics tools. Diseases are classified broadly into 12 categories. Each category is populated with number of diseases. Each entry in HDI provides a comprehensive information about human disease characterized by synonyms, general description, pathogen, gene involved, clinical features, pathways, investigations, prevention, treatment, drug, prevalence, risk factors and references. The drugs are classified into 26 broad categories, each category is populated with number of drugs with their description and links to Drug databank for detailed information. Genes involved in human diseases are collected and their links to NCBI and UniProt are provided in drop down menu to retrieve elaborate knowledge. For convenience of users, major bioinformatics tools with description and links are classified in an effort to guide them for performing specified analysis of the gene/protein. HDI can be publicly accessed from any web browser at http://humandiseaseinsight.com.

3.2. Data curation

The HDI is being enhanced through continued efforts to improve diseases knowledge and interlinking of disease, drug and gene tables to obtain optimum information. The information made available for the user is achieved after extensive data mining process. Knowledge thus obtained is managed in a relational database through cross linking to fetch the data stored in the data ware house of HDI and through cross linking of the web resources (NCBI, UniProt and DrugBank) Genes which are related to human diseases are included in the database, and are interlinked with the disease tables so as to get the name/s of disease/s governed by a specific gene.

3.3. Knowledgebase access

HDI data can be retrieved efficiently through drop down menus and search functions provided on each page of the web site. User can access alphabetically ordered diseases, drugs, genes and tools through drop down menu. Diseases from different categories can be selected through drop down menu. Clicking on the disease displays the stored information about the disease. Similarly, drugs can be selected from different categories in the drop down menu, clicking on any drug will give its description, disease/s that can be cured and linked with the drug bank for detail. For convenience of users, two different search boxes are provided. One search box present at the home page can search the complete data mart for diseases in the data ware house of the HDI. To enhance the usability of this search box, codes were written to provide auto-complete search suggestion to the user that would save searching time and do spelling correction. Another search present on each page is Google search box, which searches for the term entered, in the database as well as on the web. Our web site has provision for downloading and uploading published articles, e-books and articles related to disease, drug and genes for registered users. All uploads by the user will be timely updated in the database. For registration, signup option is provided, registered user can login for downloading and uploading related information. Medicinal and research oriented news will be emailed to the email address provided by the user. Feedback option is given to receive feedback from the user, to improve the database. Advertisement option is provided for the advertisement companies to display their advertisement on the provided space of the web site after filling the form. Further the database is connected to social networking sites for gaining popularity.

4. Database availability

The database can be accessed without any charges to retrieve disease, drug, gene and tool related information. Free registration is required for downloading and uploading the related content.

5. Salient features of the HDI

HDI is a robust knowledge management system, that manages data mined knowledge, through cross-linking of the data marts and web resources. This user-friendly, data-intensive repository provide the user a platform to retrieve comprehensive disease related information and perform gene/protein sequence based analysis using direct links of the classified bioinformatics tools. HDI allow users to upload content to improve the data base.

6. Future directions

HDI provide optimum information required for diagnosis and treatment of various human diseases. Currently, there are 625 diseases, 1440 genes, 320 drugs and 39 tools. The content of information in specified fields is rapidly expanding, our aim is to collect a complete dataset of human diseases, genes, drugs and tools and to generate a tool that can identify gene causing human disease. We also aim to integrate various bioinformatics tools to annotate human disease specific genes. In future, main challenge is to keep the dataset up to date with growing number diseases, genes, drugs and bioinformatics tools.

7. Conclusions

HDI offers a premier platform that deals with all aspects of diseases including history, symptom, cause, epidemiology, treatment, precaution, etc. Moreover, all diseases have been linked with the pharmacology, genomics, proteomics and many other relevant databases. HDI will not only help in greater understanding of the diseases and provide primary data for research but also enable to find the interactions between various diseases by comparing them by various biotools provided here in our database. The information provided shall lay the foundation for further advances in disease diagnosis and also help in design of novel approaches for diagnosing and treating diseases. We consider that, with enrichment of the database, user will get information about all of the human diseases.

To export a reference to this article please select a referencing stye below:

