This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Identification of Genes Expression Profile Associated with coronary artery disease in Asian Indians using Bioinformatics Analysis
Purpose: The aim of our study was to identify the hub genes which might participate in biological processes and regulate several interactive pathways associated with coronary artery disease (CAD).
Data and Methods: We downloaded the mRNA expression profile of GSE42148 from GEO (Gene Expression Omnibus) database, including 24 samples from 13 patients with angiographically confirmed coronary artery disease (CAD) and 11 healthy controls. Then, the analysis of differentially expressed genes (DEGs) was conducted with limma and empirical Bayes (EB) approach. GeneAnswers was used to map and analyze the KEGG pathways enriched by DEGs and STRING was applied to build the protein-protein interaction (PPI) network. At last, we used enrichGO in clusterProfiler to analyze the biological functional of differentially expressed proteins.
Results: A total of 494 genes were identified as DEGs between normal and disease samples, including 255 up-regulated genes and 239 down-regulated genes. Up-regulated genes were enriched in 12 KEGG pathways while down-regulated genes in 17 pathways. The PPI network constructed by up-regulated genes included 76 nodes and 111 edges while the PPI network constructed by down-regulated genes included 49 nodes and 53 edges. Early growth response 1 (EGR1), chemokine (C-X-C motif) receptor 4 (CXCR4), peroxisome proliferator-activated receptor gamma (PPARG) were hub genes in the network. Based on GO module analysis, we found these genes were significant associated with biological process and immune system process.
Conclusion: MDM2, HLA-G, HLA-DOB, EGR-1, PPARG and CXCR4 were hub genes and may be biomarkers for regulating the progression of CAD.
Keywords: coronary artery disease (CAD);
Coronary artery disease (CAD) is the most common cause of death in the world , placing a great economic and resource burden on patients and public health systems. Until recently, the underlying genetic mechanisms for CAD have been largely unknown, with just a list of genes identified responsible for very little of the disease in the population . Therefore, integrated exploring the pivotal genes or underlying mechanisms in the pathogenesis of CAD would facilitate the development of more effective outcomes for the diagnosis and treatment of CAD and improve people life quality
Throughout the studies of genetic mechanisms for CAD, widespread epidemiological data highlight that CAD is a common manifestation of atherosclerosis [3, 4]. Atherosclerosis is considered to be a life-long chronic inflammatory disease initiated at locations of turbulent blood flow, where lipid-laden macrophages accumulate in the arterial wall and finally form mature plaques. Different mechanisms, such as inflammation ,myocardial injury and cell apoptosis , psychological stress  and oxidative stress , play an important role in the pathogenesis of cardiovascular diseases. However, CAD, a complex disease, results from a complex interplay of multiple genes. Although recent accumulation of reliable molecular interaction data has boosted progress to identify susceptibility genes related to disease , it is still a great challenging of biomedical research to identify candidate genes associated with CAD and further elucidate their roles in the pathogenesis of complex diseases.
Recently, protein-protein interaction (PPI) networks alone or in addition to gene expression profiles are widely applied to the explore candidate genes [10, 11].In this study we used gene expression file in combination with bioinformatics analysis to identify the hub genes which might participate in biological processes and regulate several interactive pathways associated with CAD. Using this approach we will select candidate genes and associated pathways which might give a better understanding of the association of differential expression of genes/underlying pathways with CAD.
Material and methods
Data resources and preprocessing
The mRNA expression profile of GSE42148 was obtained from GEO (Gene Expression Omnibus) database in NCBI (National Center for Biotechnology Information) (http://www.ncbi.nlm.nih.gov/geo/), which was performed on the GPL13607 Agilent-028004 SurePrint G3 Human GE 8x60K microarray platform. GSE42148 is an mRNA expression profile dataset containing 24 samples from 13 patients with angiographically confirmed CAD between ages 40-55 years and 11 healthy controls with normal electrocardiograph (ECG) and matched for age, gendre and common risk factors such as diabetes and hypertension to that of the cases.
For the GSE42148 dataset, standardized preprocessing expression data was provided rather than a chip raw data files. So a boxplot graph was applied to view each chip data distribution and determine whether the median value is consistent, if agreement, RNA standardized processing was no longer needed and the RNA genetic variations were verified for subsequent analysis.
Differentially expressed genes (DEGs) analysis
In our study, the limma method  and the empirical Bayes (EB) approach were used to identify DEGs. The original expression datasets from all conditions were extracted into expression estimates, and the linear model was constructed. Then the significance of differential expression was adjusted by multiple testing with the Benjamini and Hochberg (BH) method , the adjusted p-value < 0.05 andâ”‚log2 fold change (FC)â”‚ > 2 was selected as the threshold.
Pathway enrichment analysis of DEGs
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals . In order to analyze the metabolic process of patients with CAD from a functional level, the metabolic pathways were downloaded from KEGG on May 10, 2014. GeneAnswers  software package was utilized to analyze and visualize KEGG pathways of up-regulated and down-regulation DEGs. Parameters settings of GeneAnswers were pvalueT = 0.05, testType = “hyperG” and verbose = F, respectively.
Construction of PPI network
PPI network in various organisms are increasingly becoming the focus of study in the identification of cellular functions of proteins . The PPI pairs were acquired by directly mapping the up-regulated and down-regulation DEGs to STRING . Then PPI network was constructed via these seed genes, using the species “Homo sapiens”, and the interaction pairs with the PPI score > 0.4.
The genes existed in the same module interacted with each other to drive a complete biological process. Gene Ontology (GO)  analysis is a commonly approach for functional module studies of large-scale genomic or transcriptomic data.
In this work, enrichGO function analysis in clusterProfiler packages was used to annotate and visualize the functions of modules containing DEGs [19, 20]. Parameters settings of clusterProfiler were: ont = “biological process (BP)”, pvalueCutoff = 0.05 and readable = T, respectively.
Figure 1 showed each chip data distribution. The standardization level of GSE42148 database can be judged through the position of the black line in the plot. The black line stands for the median for each set of database. As shown in Figure 1, the black lines, indicating the normalized gene expression value after data preprocessing, located almost in the same level, suggesting good standardization, and the data could be directly analyzed.
Using the limma package of R software with adjusted P value < 0.05 as thresholds, we ultimately obtained 494 differentially expressed genes between the normal and CAD samples, including 255 up-regulated genes and 239 down-regulated genes. Hierarchical clustering analysis of the DEGs and samples were shown in Figure 2. The results showed that up-regulated genes were slightly more than down-regulated genes.
Pathway enrichment analysis
In order to identify the biological processes or interactive pathways associated with CAD, up-regulated genes and down-regulated genes were significantly enriched in different KEGG pathways. The functional relationships between the identified DEGs involved in mainly pathways were shown in Figure 3. Up-regulated genes were enriched in 12 KEGG pathways while down-regulated genes in 17 pathways. The metabolic pathways enriched by up-regulated genes were Viral myocarditis, Cell adhesion molecules (CAMs), Chronic myeloid leukemia, Systemic lupus erythematosus, Prostate cancer, Allograft rejection, Graft-versus-host disease, Butirosin and neomycin biosynthesis, Type I diabetes mellitus, Nucleotide excision repair, Carbohydrate digestion and absorption, and Type II diabetes mellitus, respectively. While, the metabolic pathways enriched by down-regulated genes were Collecting duct acid secretion, Nitrogen metabolism, Staphylococcus aureus infection, Colorectal cancer, Complement and coagulation cascades, Oxidative phosphorylation, Fatty acid metabolism, Vibrio cholerae infection, Vascular smooth muscle contraction, Epithelial cell signaling in Helicobacter pylori infection, Systemic lupus erythematosus, Progesterone-mediated oocyte maturation, ErbB signaling pathway, Prostate cancer, Phagosome, Rheumatoid arthritis, and Bladder cancer, respectively. In addition, heat map of mainly KEGG pathways enriched by up-regulated genes and down-regulated genes see Figure 4.
PPI Network analysis
The PPI network constructed by up-regulated genes included 76 nodes and 111 edges while the PPI network constructed by down-regulated genes included 49 nodes and 53 edges (Figure 5). The results showed that node degree in the network from high to loworderly was early growth response 1 (EGR1) (degree = 11), chemokine (C-X-C motif) receptor 4 (CXCR4) (degree = 8), peroxisome proliferator-activated receptor gamma (PPARG) (degree = 8), early growth response 2 (EGR2) (degree = 8), nuclear receptor subfamily 4, group A, member 2 (NR4A2) (degree = 7), phosphatase and tensin homolog (PTEN) (degree = 7). The node degrees of other genes were not more than 6.
Functional modules analysis
Based on functional enrichment analysis of up and down regulated genes, we found that up-regulated genes were significantly enriched in different GO terms, such as primary metabolic process, biological process, macromolecule catabolic process, immune system process, cellular process, and so on. Down-regulated genes were significantly mainly participated in single-organism process, response to chemical stimulus, cellular response to chemical stimulus, biological process, and homeostatic process. The top ten GO terms are listed in Table 1.
CAD is complex disease caused by multiple disease genes. It is well known that complex diseases caused by interact and work cooperatively genes, but till now how they associate with diseases is not clear. Because CAD is currently a common disease leading cause of death, the screening of genes related with CAD has fundamental and applied relevance. In the present study, we identified 494 DEGs between the normal and CAD samples, including 255 up-regulated genes and 239 down-regulated genes.
Based on functional modules analysis of the identified up-regulated genes involved in GO modules (Table 1) and pathways (Figure 3), we found that MDM2 (oncogene, E3 ubiquitin protein ligase), HLA-DOB (major histocompatibility complex, class II, DO Beta) and HLA-G (histocompatibility antigen, class I, G) were significantly enriched in these pathways associated with biological process and immune system process.
MDM2 encodes a nuclear-localized E3 ubiquitin ligase, it plays an important role in inhibiting DNA damage by targeting tumor suppressor proteins, such as p53 [21, 22]. Increasing evidence suggests that DNA damage and activation of repair pathways occur in the early stages of atherosclerosis . p53 is a major downstream protein activated following DNA damage induced by a variety of stimuli present in the vessel wall . MDM2 can bind the p53 at its transactivation domain with high affinity for negatively modulating its transcriptional activity and stability . Due to CAD is a common manifestation of atherosclerosis, therefore, our study is in line with findings that p53 and its transcriptional target MDM2 co-localise in atherosclerosis , and thus lead to the progression of CAD. Notably, MDM2 can inhabit the nuclear export of the p53-MDM2 and promote the degradation of p53 [23, 26], thereby, MDM2 may be a crucial molecule in the progression of CAD.
HLA encodes the human leukocyte antigen, including class I and II molecules. HLA-G belongs to theHLAclass I while HLA-DOB belongs to class II. These molecules are located in the major histocompatibility complex (MHC) region, where there is a high density of immune-related genes and this region ia confirmed disease susceptibility . Recent studies have highlighted the association between HLA-G with immune mechanisms . The nonpolymorphic HLA-G has an important role in mediating major vascular changes, thus protecting the trophoblast from cytotoxic effects mediated by naturalkiller cells . HLA-DOB is mainly considered to restrict antigen presentation Also, studies have confirmed HLA-DOB related to paque ageare linked to atherosclerotic processes . In our study, HLA-DOB and HLA-G were significantly associated with biological process and immune system process. The results suggested that immune system play an important role in the development and progression of CAD. Similar with our results, Alfakry et al. demonstrated that the immune level increased in patients with ACS compared to normal controls . Also, studies have confirmed HLA-DOB related to paque ageare linked to atherosclerotic processes . Thus, harnessing the key factors of the immune system may be helpful to treat CAD.
In addition, the important genes locate in the central nodes in PPI network mainly were EGR1, CXCR4 and PPARG. As seen in the figure 5, these genes also are highly networked and functional associated in the network. Furthermore, these genes are all important transcription factors that play a crucial role in regulating the coronary artery disease associated pathways at transcriptional level. While the genes on the central nodes of the network have the most interactions with other genes, we researched them respectively.
EGR1 is a zinc finger transcription factor. Recently, EGR1 is considered to be an important member in regulating atherosclerosis  and co-localizes with fibroblast growth factor-2 to endothelial cell microvascular channels in human coronary artery occlusion . Also, it is shown that EGR-1 can regulate oxidative stress pathway effectively in the disease progression . The work of Vangala et al. found EGR1 could bind to cell adhesion protein, coagulation protein, promoters of inflammation pathway genes and leptin the obesity marker, and stress related biomarkers and promoters . The above binding sites of EGR1 are crucial genes or proteins reported to be involved in thrombogenesis and arterial blockage, such as inflammation, oxidative stress and platelet activation [36-39], this process thus leads to CAD to a extent. Therefore, EGR1 may be a key factor in the progression of CAD.
PPARG is a member of the nuclear receptor family of ligand-activated transcription factors. PPARG is thought to regulate cell adhesion, coagulation, inflammation, obesity, oxidative stress, and stress . Recent studies have shown that PPARG play a major role in development of CAD . In addition, the promoter binding site of PPARG are shown to be strongly associated with type-2 diabetes, a major risk factor for CAD . In our study, PPARG is a major hub gene in the network, therefore, it may play a very important role in modulating other interactive genes associated with CAD.
CXCR4 encodes a CXC chemokine receptor interacted with Chemokines, and is essential for normal cardiovascular development [42, 43]. Chemokines play essential roles in endothelial cells involved in angiogenesis, and they have effects on the development, homeostasis, and function of the immune system . Increasing evidence suggests that vascular endothelial growth factor promotes breast cancer cells invasion in an autocrine manner by regulating CXCR4 and CXCR4 play a key role in the procession of breast cancer [45, 46]. However, there are no reports about its effects on CAD. Further genetic studies are needed to confirm our observation.
Taken together, our study indicates that MDM2, HLA-G, HLA-DOB, EGR-1, PPARG and CXCR4 significantly participate in the network and pathways associated with CAD, so the above genes may a biomarker for regulating the progression of CAD. The present findings shed new light on the molecular mechanism of CAD and have implications for future research. However, the microarray data of our study were also not been analyzed carefully, more high throughput data analysis and experiments of DEGs are still needed to confirm the possible biomarkers.