This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
RNA-Seq analysis of non-small cell lung cancer in female never-smokers reveals candidate cancer-associated long non-coding RNAs
Running title: Identification of lncRNAs related to NSCLC
- Immune response may be crucial mechanism involves in NSCLC progression.
- LAT, LIME1, SLA2 and DEFB4A may be involved in NSCLC via immune response.
- Lnc-GGPS1, lnc-ZNF793 and lnc-STK4 may play roles in promoting NSCLC progression.
- Down-regulated lnc-LOC284440, lnc-PPIEL, and lnc-ZNF461 may play roles in NSCLC.
Purpose: We aimed to elucidate the potential mechanisms of long non-coding RNAs (lncRNAs) in the progression of non-small cell lung cancer (NSCLC).
Methods: The microarray datasets of GSE37764, including 3 primary NSCLC tumors and 3 matched normal tissues isolated from 6 Korean female never-smokers, were downloaded from Gene Expression Omnibus database. The differentially expressed lncRNAs and mRNA in NSCLC samples were identified using NOISeq package. Co-expression network of differentially expressed lncRNAs and mRNA was established. Gene Ontology (GO) and pathway enrichment analysis was respectively performed. Finally, lncRNAs related to NSCLC were predicted by blasting the differentially expressed lncRNAs with all predicted lncRNAs related to NSCLC.
Results: Total 182 and 539 differentially expressed lncRNAs and mRNA (109 up- and 73 down-regulated lncRNAs; 307 up- and 232 down-regulated mRNA) were respectively identified. Among them, 4 up-regulated lncRNAs, like lnc-geranylgeranyl diphosphate synthase 1 (GGPS1), lnc-zinc finger protein 793 (ZNF793) and lnc-serine/threonine kinase 4 (STK4), and 4 down-regulated lncRNAs including lnc-LOC284440 and lnc-peptidylprolyl isomerase E-like pseudogene (PPIEL), and lnc-zinc finger protein 461 (ZNF461) were predicted related to NSCLC. lncSSPS1, lnc-ZNF793 and lnc-STK4 were co-expressed with linker for activation of T cells (LAT) and Lck interacting transmembrane adaptor 1 (LIME1). Lnc-LOC284440, lnc-PPIEL and lnc-ZNF461 were co-expressed with Src-like-adaptor 2 (SLA2) and defensin beta 4A (DEFB4A).
Conclusions: Immune response may be crucial mechanism involves in NSCLC progression. Lnc-GGPS1, lnc-ZNF793, lnc-STK4, lnc-LOC284440, lnc-PPIEL, and lnc-ZNF461 may be involved in immune response for promoting NSCLC progression via co-expressing with LAT, LIME1, SLA2 and DEFB4A.
Keywords: non-small cell lung cancer (NSCLC); long non-coding RNAs (lncRNAs); co-expression network; Gene Ontology (GO); pathway enrichment analysis
Lung cancer is the leading cause of cancer-related mortality around the world , in which non-small cell lung cancer (NSCLC) accounts for 80-85% of all lung cancers . Smoking is the main cause of lung cancer, however, prevalence of NSCLC in females never-smoker patients has been observed, particularly in Asian countries [2, 3]. These epidemiological data make non-smoking-associated lung cancer becoming a distinct disease entity, where specific genetic and molecular characteristics of tumors are being recognized . Despite the recent advances in NSCLC therapies, the high mortality of NSCLC patients has not significantly decreased over the years . Therefore, exploring more effective and safe treatment strategies is urgent, and it is of great importance to elucidate the mechanisms involved in NSCLC at molecular levels.
Recently, long non-coding RNAs (lncRNAs) are emerging as drivers of tumor suppressive and oncogenic functions in various prevalent cancers, such as lung cancer [5, 6]. LncRNAs are mRNA-like transcripts ranging in length from 200 nt to 100 kb lacking significant open reading frames, therefore, they do not function as templates for protein synthesis [7, 8]. In spite of this, accumulating epidemiological studies have suggested that misregulated lncRNA expression may be a major contributor to tumorigenesis across numerous cancer types [8, 9]. For example, the lncRNA metastasis associated lung adenocarcinoma transcript 1(MALAT1) is thought to enhance cell migration of NSCLS cells in vitro by influencing the expression of motility-related genes [10, 11]. lncRNA HOTAIR is associated with short disease-free survival in human NSCLC, and forced expression of HOTAIR enhances lung cancer cell growth and migration . Knockdown of H19 expression can impair lung cancer cell growth and clonogenicity in model systems in vitro . Therefore, lncRNAshave been considered as key regulators underlying various and are increasingly becoming a new cancer diagnostic and therapeutic gold mine. However, several major lncRNAs related to NSCLC and their roles in the molecular pathogenesis of NSCLC remains unclear.
Substantial advances in next generation sequencing technologies have revolutionized omics and biomedical studies, especially in the field of cancer research . Deep sequencing techniques provide a comprehensive understanding of cancer progression at the molecular level . In previous study, GSE37764 was used to explore the DNA copy number variations in female never-smoker patients with NSCLC for dissecting the molecular nature of NSCLC via integration with array comparative genomic hybridization (array-CGH) study . In the present study, in order to elucidate the roles of lncRNAs in NSCLC progression, lncRNA profiling by high throughput sequencing (RNA-seq) was used to screen the differentially expressed lncRNAs in female never-smokers with NSCLC. Then lncRNAs related to NSCLC and co-expressed mRNAs were identified using comprehensive bioinformatics approaches. Our study will yield new insights into the pathogenesis of NSCLC in female never-smokers.
Material and methods
Sources of data
The array data of GSE37764 , including 3 primary NSCLC tumors and 3 matched normal tissues isolated from 6 Korean femalenever-smokers, was downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/), which was sequenced on Illumina Genome Analyzer IIx (Homo sapiens) platform. Sequencing strategy was paired-end reads and reads length was 78nt.
Raw read filtering
The raw reads were firstly converted into the fastq format using fastq-dump program in sratoolkit , then dirty raw reads was removed prior to analyzing the data. Three criteria were utilized to filter out dirty raw reads: Remove reads with sequence adaptors; Remove reads with more than 5% ‘N’ bases; Remove low-quality reads, which have more than 10% QA ≤ 20 bases. Finally, clean reads were acquired for all subsequent analyses.
Sequence alignment and transcriptome assembly
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) is an online public tool providing access to a growing database of genomic sequence and annotations of various organisms for visualization, comparison and analysis . TopHat and Cufflinks  are open-source software tools for gene discovery and comprehensive expression analysis of high-throughput RNA sequencing (RNAseq) data.
Clean reads were aligned to the reference genome downloaded from the UCSC website (version hg19) using bowtie1 in Tophat. The runtime parameters of bowtie1 in the alignment for each read were sets as follows: --read-mismatches = 2, --mate-inner-dist = 77, the others run as default parameters.
According to the reference transcript annotation information in UCSC website (version hg19), transcriptome of each read was assembled by Cufflinks. Then the assembled results of each read were merged using cuffmerge in Cufflinks.
Step1, the assembled transcriptssmaller than 200 nt were removed;
Step3, lncRNAs were also screened via blasting transcriptsfrom step1 with human lncRNAs extracted from the NONCODEv4  database using blastn in BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The parameters of blastn program were sets as follows: Expectation value (E) [Real] = 20; -m = 8.
Step4, common lncRNAs from Step2 and Step3 were considered as predicted lncRNAs.
Identification of differentially expressed lncRNAs and mRNA
Co-expression network construction
The absolute value of Pearson correlation coefficient was used as co-expression similarity measure. Cytoscape is a open software for visualizing complex networks and integrating these with any type of attribute data. Therefore, the differentially co-expressed lncRNAs-mRNA pairs with Pearson correlation coefficient > 0.85 were screened, then, the co-expression network of these pairs was established using Cytoscape.
GO database is a collection of gene annotation terms for large-scale genomic or transcriptomic data. Database for Annotation Visualization and Integrated Discovery (DAVID)  is an online tool used for systematically relating the functional terms with large gene or protein lists. We performed GO-BP (biological process) enrichment analysis of differentially co-expressed mRNA with lncRNAs using DAVID online too. The p-value <0.05 was defined as the cutoff value.
Web-based Gene Set Analysis Toolkit (WebGestalt) is web-based popular softwarefor the efficient functional enrichment analysis of gene lists derived from large scale genomic, transcriptomic, and proteomic studies.In this paper, pathway enrichment analysis of differentially co-expressed mRNA with lncRNAs was further performed by WebGestalt. The rawR < 0.01 was set as the threshold value.
Prediction of lncRNAs related to NSCLC
Firstly, the differentially expressed lncRNAs was screened based onhg19referencegenome information in UCSC website again.
Secondly, lncRNAs related to NSCLCwere exacted from LncRNADisease. LncRNADisease  is publicly accessible lncRNAs and disease association database, which collect and curate approximately 480 entries of lncRNA-disease associations by experiment validation, including 166 diseases.
Thirdly, lncRNAs were obtained via blasting the above differentially expressed lncRNAs with lncRNAs related to NSCLC using blastn in BLAST. The parameters of blastn program were as follows: Expectation value (E) [Real] = 10; -m = 8. Then lncRNAs were removed based on following criteria: lncRNA was smaller than 200 nt and blast similarity was less than 90.
Finally, the reacquired lncRNAs were considered as potential lncRNAs related to NSCLC.
Prediction of lncRNA
In our study, total 1282 predicted lncRNA sequences were obtained, thereinto, min sequence length of lncRNAs was 201 nt, max sequence length of lncRNAs was 9848 nt, and average sequence length of lncRNAs was 843.81 nt.
Identification of differentially expressed lncRNAs and mRNA
Using NOISeq package with q = 0.99 as thresholds, we ultimately obtained 182 differentially expressed lncRNAs, including 109 up-regulated lncRNAs and 73 down-regulated ones. In addition, total 539 differentially expressed mRNA (307 up- and 232 down-regulated) was identified.
Prediction oflncRNAs related to NSCLC
By blasting the differentially expressed lncRNAs with all predicted lncRNAs related to NSCLC, we screened 8 differentially expressed lncRNAs related to NSCLC. Among them, lnc-geranylgeranyl diphosphate synthase 1 (GGPS1), lnc-zinc finger protein 793 (ZNF793), lnc-serine/threonine kinase 4 (STK4), and lnc-interferon regulatory factor 1 (IRF1) were up-regulated while lnc-myelin expression factor 2 (MYEF2), lnc-LOC284440, lnc-peptidylprolyl isomerase E-like pseudogene (PPIEL), and lnc-zinc finger protein 461 (ZNF461) were down-regulated.
Co-expression network analysis
As shown in Figure 1, co-expression network analysis of differentially expressed lncRNAs and mRNA was established. The results showed that lncSSPS1, lnc-ZNF793 and lnc-STK4 were co-expressed with linker for activation of T cells (LAT) and Lck interacting transmembrane adaptor 1 (LIME1), lnc-IRF1 was co-expressed with interferon-inducible guanylate binding protein 1 (GBP1) and GBP2, lnc-MYEF2 was co-expressed with interleukin 20 (IL20) and GLI family zinc finger 4 (GLI4), lnc-LOC284440, lnc-PPIEL and lnc-ZNF461 were co-expressed with Src-like-adaptor 2 (SLA2) and defensin beta 4A (DEFB4A).
GO and pathway enrichment analysis
We performed GO-BP and pathway enrichment analysis for differentially expressed mRNA. The overrepresent GO-BP terms were mainly associated with immune response, placenta development and synapsis (Table 1). The significantly enriched pathways Interferon Signaling, LPA receptor mediated events, and Cytokine Signaling in Immune system (Table 2). Notably, mRNAs, such as LAT, LIME1, SLA2, DEFB4A, GBP1 and GBP2, were significantly associated with the function of immune response. These mRNAs were co-expressed with lncRNAs related to NSCLC.
NSCLC ranks among the most diagnosed cancer as well as lethal malignant diseases . Although functional roles and dysregulation of lncRNA in cancer development and progression are beginning to be disclosed , the research of mechanisms of action of lnRNAs in NSCLC remains at a preliminary level. In the current study, we utilized the comprehensive bioinformatics approaches to analysis the lncRNA profiling by high RNA-seq for screen the differentially expressed lncRNAs and co-expressed mRNAs in female never-smokers with NSCLC. Strikingly, the up-regulated lncRNAs, including lnc-GGPS1, lnc-ZNF793 and lnc-STK4, and down-regulated lncRNAs, such as lnc-LOC284440, lnc-PPIEL, and lnc-ZNF461, were considered to be strongly related to NSCLC. In addition, the mRNAs co-expressed these lncRNAs, such as LAT, LIME1, SLA2 and DEFB4A, were all significantly associated with the function of immune response.
Numerous evidences have suggested that the interactions between the host immune system and tumors are closed tied to the process of tumorigenesis, and intratumoral immune responses can predict patient prognosis with NSCLC . Increased infiltration with CD4+/CD8+ T cells and other antigen presenting cells in NSCLC tissues is independently associated with improved survival [25-27]. Immune cells in the tumour microenvironment can interact intimately with the transformed cells to promote oncogenesis actively . Moreover, Dougan et al. reported that the alterations in immune response genes could initiate the development of lung cancer in the absence of germline mutations in known oncogenes or tumor suppressors . Therefore, the blockade of the function of immune system may contribute to NSCLC progression, and the initiation of immune response may be an important indicator of NSCLC.
LAT is an transmembrane protein of 36–38 kd, which playsan important role inT cell activation and immune receptor signaling [30, 31]. The binding of major histocompatibility complex-bound foreign peptides to T cell antigen receptors (TCRs) is a key event in the regulation of the adaptive immune response . LAT may function as a bridge between TCR-initiated, T cell-specific signaling events and general signaling pathways . In addition, LIME1 is a raft-associated transmembrane adaptor phosphoprotein which is shown to be a organizer of immunoreceptor signaling . It is expressed in predominantly in T lymphocytes and is found to mediates T cell activation . The work of BrdiÄková et al. revealed that LIME was involved in CD4 and CD8 coreceptor signaling . Due to the important functions of LAT and LIME1 in immune response and the contributions immune response to NSCLC progression, we speculate that LAT and LIME1 may play an important role in promoting the NSCLC progression via activating immune response. In our study, Lnc-GGPS1, lnc-ZNF793 and lnc-STK4 were co-expressed with LAT and LIME1. Although the roles of these lncRNAs in NSCLC development have not been fully discussed, we speculated Lnc-GGPS1, lnc-ZNF793 and lnc-STK4 may be involved in immune response for promoting NSCLC progression via co-expressing with LAT and LIME1.
SLA2 is one of the Src-like adaptor protein (SLAP), and SLAP can down-regulates the T cell receptor on CD4 and CD8 thymocytes . Previous study also reveals that SLAP is a negative regulator of T cell receptor signaling . Moreover, SLA2 shares amino acid and structural homology with SLAP, which has been confirmed to down-regulate TCR-mediated signal transduction [37, 38]. Additionally, DEFB4A belongs a member of defensins family which is involved in the first line of defense in their innate immune response against pathogens . Enhanced expression of DEFB4A is an evidence of a proinflammatory and/or innate immune response . Therefore, SLA2 and DEFB4A may be key molecules involved in immune response and play important roles in cancer progression while there are hardly any researches about the roles of SLA2 and DEFB4A in NSCLC. Notably, down-regulated lncRNAs, such as lnc-LOC284440, lnc-PPIEL, and lnc-ZNF461 were co-expressed with SLA2 and DEFB4A. Thus, our results further suggest that the above lncRNAs may contribute to the development and progression of NSCLC.
In conclusion, our findings suggest that immune response may be crucial mechanism involves in NSCLC progression. Lnc-GGPS1, lnc-ZNF793, lnc-STK4, lnc-LOC284440, lnc-PPIEL, and lnc-ZNF461 may be involved in immune response for promoting NSCLC progression via co-expressing with LAT, LIME1, SLA2 and DEFB4A. The present findings shed new light on the molecular mechanism of NSCLC and may lead to novel clinical applications in oncology. However, no experimental validation is the limitation of our study. More works are still needed to explore the potential molecular mechanisms for diagnosis and treatment of NSCLC.