Gene expression profiles on predicting protein interaction new treatments for lung cancer

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Gene expression profiles on predicting protein interaction network and exploring of new treatments for lung cancer

Purpose: In the present study, we aimed to explore disease-associated genes and their functions in lung cancer. Methods: Gene expression profile GSE4115 was downloaded from Gene Expression Omnibus database. Total 97 lung cancer and 90 adjacent non-tumor lung tissue (normal) samples were used to identify the differentially expressed genes (DEGs) by paired t-text and variance analysis in spectral angle mapper (SAM) package in R. Gene Ontology (GO) functional enrichment analysis of DEGs were performed with DAVID, followed by construction of protein-protein interaction (PPI) network from HPRD. Finally, network modules were analyzed by the MCODE algorithm to detect protein complexes in the PPI network. Results: Total 3102 genes were identified as DEGs at FDR < 0.05, including 1146 down-regulated and 1956 up-regulated DEGs. Go functional enrichment analysis revealed that up-regulated DEGs mainly participated in cell cycle and intracellular related functions, and down-regulated DEGs might influence cell functions. There were 39240 pairs of PPIs in human obtained from HPRD databases, 3102 DEGs were mapped to this PPI network, in which 2429 pairs of PPIs and 1342 genes were identified. With MCODE algorithm, 48 modules were selected, including 5 corresponding modules and 3 modules with differences in gene expressing profiles. In addition, three DGEs, FXR2, ARFGAP1 and ELAVL1 were discovered as potential lung cancer related genes. Conclusion: The discovery of featured genes which were probably related to lung cancer, has a great significance on studying mechanism, distinguishing normal and cancer tissues, and exploring new treatments for lung cancer.

Key words: lung adenocarcinoma; differentially expressed genes; protein-protein interaction network;


As a kind of common malignant tumor, lung cancer is a leading cause of cancer death worldwide [1]. The rates of lung cancer occurrence in China and some Asian and African countries are increasing [2-4]. Surgery has been used to treat lung cancer historically and has been received positive experience in patients at the early stage [5-7], and in stage III disease of lung cancer [7-9]. Primary lung tumors were limited to bronchi pulmonary lobules, therefore, early before lymphoglandula metastasis and distant metastasis, 5 years of survival rate after the operation can be amounted to over 50 %. However, the high mortality rate is partly due to lacking of effective diagnostic method for the disease at the early stage [10]. As there is even no significant symptom in early lung cancer, diagnosis based on symptoms to detect lung cancer is not very effective. The first diagnosis cannot be made until disease has already reached the metastatic phase (phase â…¢ or â…£) [11]. Accordingly, early accurate diagnosis and therapy remain the important missions of medicine researchers in the long periods from now on [12].

These days, with the highly development of bio-technology and constantly improvement of high-throughput screening (HTS) technology, people return to the scientific study of the nature of lung cancer and its causes, processes, development, and consequences on the genome level [13]. Gene expression profiling using microarrays is a robust and straightforward way to study the molecular features of different types and subtypes of cancer at a system level [14]. Recent researches base on the data of gene expression profiles to select biomarkers associated with lung cancer, combining gene expression profiles with bio-networks (PPI network; signal network; regulatory network; metabolic network), the complex pathogenesis of lung cancer are analyzed [15, 16]. This simple technique has been proved to be helpful in accurate diagnosis of lung cancer at the early stage [17]. It has been confirmed that the development of lung cancer is a multi-step and multi-stage process in which many genes participate [11]. Thus, identifying the differentially expressed genes (DEGs) between normal non-tumor and lung cancer tissues can intensify the comprehension for the molecular mechanisms of lung cancer occurrence and development [14].

In this study, we downloaded GSE4115, identified DEGs and performed functional enrichment analysis to gain the potential molecular mechanisms of lung cancer. In addition, we constructed protein-protein interaction (PPI) network and module analysis. By detecting network modules, the pathogenesis of lung cancer was further analyzed from modularity perspective.

Materials and methods

Data preprocessing

The microarray data of GSE4115 [18] were downloaded from Gene Expression Omnibus (GEO) database, the largest comprehensive public functional genomics data repository. A total of 187 chips derived from bronchial epithelium of smokers were available, including 90 normal samples (without lung cancer) and 97 lung cancer specimens (with lung cancer) [19]. The probe-level data were converted into the corresponding genetic symbols, basing on the relationship of the genes and the matching probes on platform GPL96. By taking the average expression value, the expression values of all probes for each gene were convert to a single value. .

Analysis of DEGs

After the gene expression was analyzing by AffymetrixGenechip System (Affymetrix, Santa Clara, CA), the SAM algorithm was applied directly. With t-text and variance analysis, DEGs were identified. By controlling false discovery rate (FDR) with SAMr[20] package in R, SAM algorithm was used to avoid the multi-test problem to reduce false positive results. FDR < 0.05 was considered to be significant DEGs.

Relative difference (d) was computed as follow formula:

The d-statistics assay the relative differences between gene expressions. X1’ represents the average gene expression level on condition 1, while X2’ represents the one on condition 2 and s is the variance of the genes.

Functional enrichment analysis

The Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis were performed using Database for Annotation Visualization and Integrated Discovery (DAVID) [21, 22]. With fisher accurate test, the enrichment of a group of genes at each GO function node or KEGG pathway was detected in the follow formula.

“a” represents DEGs in KEGG pathway or GO gene cluster; “b” represents not DEGs but in KEGG pathway or GO gene cluster; “c” represents DEGs but not in KEGG pathway or GO gene cluster; “d” represents not DEGs and not in KEGG pathway or GO gene cluster.

PPI network construction

As a protein database, HPRD (Human Protein Reference Database) [23] is accessible through the internet. Comparing with other PPI databases like BIND [24]、DIP [25]、intAct[26]、STRING [27] and so on, data about human PPI relationships and genes related to some diseases in HPRD were more abundant (Fig.1; Fig2). Therefore, the PPI from HPRD were collected for the construction of differential protein interaction network. The DEGs were mapped to the HPRD database and screened significant interactions. By integrating these relationships, interaction network was constructed.

Identification of candidate network molecules

The MCODE algorithm [28] was applied to detect protein complexes in large PPI networks and was calculated using a vertex-weighting scheme. The clustering coefficient Ci was used to measure 'cliquishness' of the neighborhood of a vertex [29] via the formula: Ci= 2n/ki(ki-1) where ki denotes the number of nodes which directly connected with nodes i, n denotes the number of edges among these ki nodes. The maximum weight of a node was regarded as seed. After adding the node j which was over a certain threshold, the weight ratio (Wj / W seed) was searched out. A module was constructed by repeating search until j does not meet the given threshold via the MCODE algorithm of plug-in clusterViz in CytoScape [30]. In this way, the densest regions of the network are identified.


Screening DEGs

Total 3102 DEGs were selected at FDR < 0.05 among normal and lung cancer samples, including 1146 down-regulated and 1956 up-regulated DEGs.

Functional enrichment analysis

With FDR order, top 15 categories were selected. The functional enrichment of up- and down-regulated DEGs was shown in Table 1 and Table 2, respectively. The results suggested that the up-regulated DEGs mainly participated in cell cycle and intracellular related functions, and the down-regulated genes might influence cell functions like cellular adhesion and motility.

PPI network construction

Total 39240 pairs of PPIs in human were obtained from HPRD databases, titling the proteins with the names of the coding genes at the same time. To determine the function of them in lung cancer, 3102 DEGs selected out of gene expressing profiles were mapped to this PPI network, identifying the interactions of them to form a sub-network named differential protein interaction network. There were 2429 pairs of PPIs and 1342 genes in it (Fig.3).

Network module detection and analysis

Total 48 modules were selected from PPI networks, of which there were 14 modules contained more than 5 genes. To identify modules related to lung cancer, 219 genes were downloaded from databases CancerResource. Mapped these genes to the modules with more than 5 genes, 5 corresponding modules (Fig.4) and 3 modules with differences in gene expressing profiles (Fig.5) were identified. Disagreement probably indicated the genes might lead to lung cancer; while agreement suggested the genes might influent the development of lung cancer. Module 1 and 2 in Fig.4 suggested that only genes related to lung cancer were down-regulated, and genes related to lung cancer which were down-regulated also appeared in Fig.5. Modules in it contained genes did not map to lung cancer. In these three modules, FXR2 (fragile X mental retardation, autosomal homolog 2), ARFGAP1 (ADP-ribosylation factor GTPase-activating protein 1) and ELAVL1 (ELAV like RNA binding protein 1) showed differences in expression.


As there is no significant symptom in early-stage lung cancer and it is hard to detect by conventional chest radiography, it is urgent to develop novel methods of diagnosis and therapy. Molecular biology lights the further study of the pathogenesis underlying this disease. In this study, we combined gene expressing profiles and selected out 3102 DEGs with FDR < 0.05. After the functional enrichment analysis was performed, PPI network was constructed, followed by analyzing the modules of network. Identifying the interactions of networks, 2429 pairs of PPIs and 1342 genes in it were identified.

As is known, all molecules in cell functioned differentially but cooperatively in regulating the activities of cells [32, 33]. By constructing interaction network and analyzing GO functional enrichment, we were able to learn that how the DEGs influent cell activities. From the results of PPI network construction of up- and down-regulated genes, it may suggest that the up-regulated DEGs mainly participated in cell cycle and intracellular related functions, while the down-regulated DEGs might influent cell functions like cellular adhesion and motility. It has been reported that cell cycle is also closely associated with DNA replication. Cell cycle and DNA replication are related to abnormal repairs. These abnormalities can cause malformation growth, mental retardation, and even lead to cancer [34]. Accordingly, the imbalance of intracellular and extracellular functions takes part in the onset and process of lung cancer.

As for protein level, PPI networks help provide the cooperation connections of proteins. Besides, the DEGs were associated with the development of various diseases. However, the way how these DEGs were related to each other could not be analyzed directly from expressing profiles. Accordingly, by way of constructing PPI networks of DEGs, detecting and analyzing network modules, the pathogenesis of lung cancer was possible to discover from the constitution of different modules [35]. The activity of cell is mostly composed of a network by differently interacted modules: groups of genes co-regulated to reacted to different situations [36]. The differences between gene expressing profiles in modules probably account for the development of lung cancer.

Besides, potential genes ARFGAP1, ELAVL1 and FXR2 which were related to lung cancer were discovered while comparing modules contained common profiles. As the over-expression and invasive actions, ARFGAP1 plays important roles in tumor and malignancy invasion [37]. In addition, ARFGAP1 has over-expressed in seven cancer cell lines [38]. ELAV1, named HuR as well, has been reported to be closely related to lung cancer in several articles. HuR has been found to be expressed in NSCLC (non-small-cell lung cancer) and have close relationship with lymphangiogenesis and angiogenesis [39]. FXR2 is a RNA binging protein which is similar to FXR1 and FMRP. FXR1 protein has found the ability to regulate the expression of tumor necrosis factor at the post-transcriptional level [41]. Therefore, there are significant reasons to conclude that FXR2 is a member of potential lung cancer-related genes.

In summary, the discovery of featured genes which were possibly related to the development and progress of lung cancer, is not only supported by previous studies, but also more reified than the researches. In addition, it has a great significance on studying mechanism, probing and distinguishing normal adjacent with cancerous tissues, as well as exploring better methods for diagnosis and treatments for lung cancer. However, further experiments are still needed to confirm our result.


We wish to express our warm thanks to Fenghe(Shanghai) Information Technology Co., Ltd.Their ideas and help gave a valuable added dimension to our research.


All human studies have been approved by China Ethics Committee and performed in accordance with the ethical standards.

Conflict of interest

Figures and tables

Fig.1.Comparison of PPI relationships and protein capacity among 8 PPI databases

Fig.2. Comparison of gene numbers and disease related gene numbers among 8 PPI databases

Fig.3 Differential Protein Interaction Network The red nodes stand for and the green nodes stand for n down-regulated genes.(p-value< 0.05, adjusted p-value < 0.05)

Fig.4 5 Modules Mapped to Genes of Lung Cancer. Round nodes represented known genes of lung cancer; red and green represented up- and down-regulated genes, respectively.

Fig.5Three modules with differences in expressing profiles without mapping

Red stand for up-regulated genes, green stand for down-regulated ones.