This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Bioinformatics analyses of significant prognostic risk markers for thyroid papillary carcinoma
Running title: Prognostic risk markers for TPC
1. SOX4 was both a prognostic risk gene and a target gene of miRNA-93.
2. BCL2, a target gene of miRNA-342, was enriched in the DO terms of lymphoma.
3. RARA and MAFF were identified to be prognostic risk genes in our study.
Purpose: This study was aimed to identify the prognostic risk markers for thyroid papillary carcinoma (TPC) by bioinformatics analyses.
Methods: The clinical data of TPC and their microRNAs (miRNAs) and genes expression profile data were downloaded from The Cancer Genome Atlas. Elastic net-Cox’s proportional regression hazards model (EN-COX) was used to identify the prognostic associated factors. The receiver operating characteristic (ROC) curve and Kaplan-Meier (KM) curve were used to screen the significant prognostic risk miRNA and genes. Then, their target genes of the obtained miRNAs were predicted followed by function prediction. Finally, the significant risk genes were performed literature mining and function analysis.
Results: Total 1046 miRNA and 20531 genes in 484 cases samples were identified after data preprocessing. From the EN-COX model, 30 prognostic risk factors were obtained. Based on the 30 risk factors, 3 miRNAs and 11 genes were identified from the ROC and KM curves. The target genes of miRNA-342 such as B-cell CLL/lymphoma 2 (BCL2) were mainly enriched in the biological process related to cellular metabolic process and Disease Ontology terms of lymphoma. The target genes of miRNA-93 were mainly enriched in the pathway of G1 phase. Among the 11 prognostic risk genes, v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog F (MAFF), SRY (sex determining region Y)-box 4 (SOX4) and retinoic acid receptor, alpha (RARA) encoded transcription factors. Besides, RARA was enriched in 4 pathways.
Conclusions: These prognostic markers such as miRNA-93, miRNA-342, RARA, MAFF, SOX4 and BCL2 may be used as targets for TPC chemoprevention.
Key words: thyroid papillary carcinoma; prognostic risk gene; prognostic risk microRNA; target gene; function prediction
Thyroid cancer is the most common endocrine malignancy with a rapid rise in incidence all over the world [1, 2]. Thyroid papillary carcinoma (TPC) accounting for more than 90% of newly diagnosed thyroid cancer cases, is the most common type of thyroid cancer and also the predominant cancer type in children with thyroid cancer [3, 4]. TPC is usually characterized by slow growth and an favorable overall prognosis , however, some cases are fatal. The death is either on account of local recurrence or synchronous/late-developing distant metastasis . The exploration of prognostic risk markers may give new insights into the treatment of this disease.
It is well known that human malignancies mostly result from a random accumulation of genetic and epigenetic aberrations , moreover, the prognostic and therapeutic implications associated with these aberrations are becoming more and more apparent . Importantly, previous studies have demonstrated the presence of gene expression changes in TPC [9, 10]. In recent years, the impact of molecular factors is becoming more substantial in TPC . For instance, Wreesmann et al. suggested that overexpression of mucin 1, cell surface associated is significantly associated with the outcome of TPC and may serve as a prognostic marker of this disease . Using the immunohistochemistry staining score, Klein et al. found that the vascular endothelial growth factor was a helpful marker for metastasis spread in TPC . Additionally, v-raf murine sarcoma viral oncogene homolog B1 has been found to have a high frequency of mutation in TPC, and is closely correlated with clinicopathological factors of poor prognosis in TPC . Although several clinicopathological risk factors have been identified, none consistently identifies patients at risk for poor outcome.
In the present study, we downloaded the microRNAs (miRNAs) and genes expression profile data of TPC cases in The Cancer Genome Atlas (TCGA). Elastic net-Cox’s proportional regression hazards model (EN-COX) was used to identify the prognostic risk factors. Then, the receiver operating characteristic (ROC) curve and Kaplan-Meier (KM) curve were used to screen the significant prognostic risk miRNA and genes. The target genes of the obtained miRNAs were predicted and performed functional enrichment analysis. Finally, the significant risk genes were performed literature mining and function analysis. We aimed to identify the candidate genes that may underlie the TPC prognosis.
2. Data and methods
2.1 Data sources
TCGA (https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp) database contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. In the present study, the clinical data of 503 cases including 492 TPC cases were downloaded from TCGA database. Among the 492 TPC cases, there were 14 samples without accurate death times and their death times were considered to be the last contact day of the censored data.
Based on the platform of IlluminaHiSeq high-throughput sequencing, we downloaded all the miRNAs expression profile of 573 cases and all the gene expression profile of 572 cases. After sample integration, 1046 miRNA expressing in 486 TPC patients and 20531 genes in 490 TPC patients were obtained. Finally, the TPC patients who were detected both miRNAs and genes were screened.
2.2 Data preprocessing
The expression values of miRNA and genes were estimated by reads per kilobase of exon per million mapped reads (RPKM) . Besides, the different samples were performed data normalization by median method.
2.3 Prognostic risk factor identification
Elastic net (EN)  is a regularization and variable selection method used to deal with collinearity and reduce dimension. The Cox’s proportional regression hazards model is a semiparametric model for the analysis of survival data. In the present study, the EN-COX regression analysis was performed to identify the prognosis risk genes and miRNAs using glmnet package (http://www-stat.stanford.edu/~hastie/glmnet/)  in R. The parameter was the minimum of lambda.
ROC curve is used to evaluate classifiers in bioinformatics applications. At present, based on the obtained risk genes, miRNAs and their expression profile data, ROC curves were generated using the pROC package (http://expasy.org/tools/pROC/)  in R to to determin the cutoff values of cases grouping.
KM curve is used for univariate statistical analysis of the survival data. In our study, KM curves were generated using survival package (http://CRAN.R-project.org/package=survival)  in R and were compared by log-rank test to to determine whether there were significant differences between the two groups of cases (p-value < 0.05). The obtained genes and miRNAs were considered to be significantly associated with the prognosis of TPC.
2.5 Prognostic risk miRNAs function prediction
Based on the obtained risk miRNAs, the regulatory relationship pairs between miRNAs and their target genes in the database of miRecords (http://miRecords.umn.edu/miRecords)  and MiRWalk (http://mirwalk.uni-hd.de/)  were identified. Then, the microRNA regulatory network was constructed based on the regulatory relationship pairs using Cytoscape (http://www.cytoscape.org/)  which is a open source software used for visualizing biological network and integrating data.
Finally, the target genes were performed Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/)  and REACTOME ((http://www.reactome.org/)  pathway, Gene Ontology (GO, http://www. geneontology.org/)  and Disease Ontology (DO, http://disease-ontology.org) enrichment analyses. The enriched pathways, biological process (BP), cellular component (CC), molecular function (MF) and DO terms were determined by carrying out the hypergeometric test within TargetMine (http://targetmine.nibio.go.jp/)  which is a data warehouse for retrieval of target genes and proteins. The adjusted p-value < 0.05 was set as the threshold value.
2.6 Prognostic risk genes literature mining and function analysis
Based on the predicted target genes of the risk miRNAs and the identified prognostic risk genes, the genes that were both target genes and prognostic risk genes were selected to perform literature mining and function analysis with p-value < 0.05.
Among the selected genes above, the transcription factors (TFs) were detected based on the TRANSFAC® database (http://www.gene-regulation.com) . In addition, transcriptional regulatory network of the detected TFs and their target genes was constructed based on the Human Transcriptional Regulation Interactions database (HTRIdb)  which is freely available at http://www.lbbc.ibb.unesp.br/htri.
3.1 Data preprocessing
After the original data in TCGA were preprocessed, the expression values of 1046 miRNA and 20531 genes in 484 cases samples were obtained.
3.2 Prognostic risk genes and miRNAs identification
The lambda = 0.1536506 was set as the parameter. Based on the parameter, 30 prognostic risk factors including 5miRNAs and 25 genes were obtained.
3.3 Prognostic marker identification
According to the cutoff values of ROC curves, the cases were divided into 2 groups. Specifically, the case with miRNAs or genes expression values below the cutoff value was ranked as group 1 while above the cutoff value was ranked as group 2. Under the threshold of p-value < 0.05, 11 genes and 3 miRNAs including miRNA-342, miRNA-93 and miRNA-3660 were obtained (Table 1). The 14 genes and miRNA were considered to have significant risk on the prognosis of TPC patients,
3.4 Prognostic risk miRNAs function prediction
Based on the database of miRecords and MiRWalk, 1029 target genes of the 3 miRNA were predicted , thereinto, miRNA-342 had 355 target genes, miRNA-93 had 674 target genes and miRNA-3660 did not have target gene (Figure 1).
The target genes of miRNA-342 were mainly enriched in the pathway of axon guidance, BP terms related to cellular metabolic process and DO terms of lymphoma and mastocytosis (Table 2). The target genes of miRNA-93 were mainly enriched in the pathway of G1 phase and GO terms related to nervous system development (Table 3).
3.5 Prognostic risk genes literature mining and function analysis
Among the 1029 target genes, only SRY (sex determining region Y)-box 4 (SOX4) which was a target gene of miRNA-93 belonged to the obtained 11 prognostic risk genes. In addition, among the 11 prognostic risk genes, v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog F (MAFF), SOX4 and retinoic acid receptor, alpha (RARA) encoded TFs and the transcriptional regulatory networks of the 3 TFs were shown in Figure 2.
Additionally, among the 11 prognostic risk genes, RARA was enriched in 4 pathways including retinoid X receptor (RXR) and retinoic acid receptor (RAR) heterodimerization with other nuclear receptor, RARs-mediated signaling, nuclear receptor transcription pathway and acute myeloid leukemia.
TPC is the most common malignant tumor of the thyroid gland. The increased incidence of thyroid cancer over the last decade is primarily due to the increase in TPC . Due to the wide divergence in clinical behavior, the optimal management of TPC is dependent on the assessment of the malignant potential of individual tumors . In our study, 14 prognostic markers including 11 genes and 3 miRNAs were identified. The target genes of the 3 miRNAs were predicted, besides, these target genes were mainly enriched in the DO terms of lymphoma and mastocytosis. Additionally, SOX4 was found to be both prognostic risk gene and target gene of miRNA-93. Among the 11 risk genes, MAFF, SOX4 and RARA encoded TFs, besides, RARA was enriched in 4 pathways. The results suggested that these risk factors may play important roles in the progression of TPC.
In our study, SOX4 was among the 11 prognostic risk gene, besides, it was a target gene of miRNA-93. SOX4 encodes a member of the SOX family of TFs which may function in the apoptosis pathway leading to cell death as well as tumorigenesis . Pramoonjago et al. have suggested that SOX4 induced apoptosis may in part on account of the dysregulation of pathways that promoted the cell cycle in tumor cells . The oncogenic potentials of SOX4 have been demonstrated in knock-in cells resulting in aberrant transformation, while in knockout tumor cells leading to the reduction of proliferation and metastatic capability . Presently, SOX4 is found to be aberrant expressed in prostate cancer, lung cancer and bladder cancer with advanced disease status and poor prognostic features [33-35]. Additionally, miRNA-93 has been found to promote carcinogenic processes in breast epithelial cells including decrease apoptosis, increase cell migration and colony formation, whereas silencing of miRNA-93 inhibites these carcinogenic processes . Taken together, our findings suggest an oncogenic potential of miRNA-93 and its target gene, SOX4, in TPC progression.
In addition, we found that B-cell CLL/lymphoma 2 (BCL2), a target gene of miRNA-342, was enriched in the DO terms of lymphoma. BCL2 encodes an integral outer mitochondrial membrane protein that blocks the apoptotic death of some cells such as lymphocytes and epithelial cells . BCL2 is present in a majority of follicular B cell lines. Most of the human follicular B-cell lymphomas show a translocation of the BCL2 gene . In thyroid cancer, a typical example of epithelial tumours, BCL2 expression is related to cell differentiation [37, 39]. In short, in both human lymphoma and epithelial carcinoma, BCL2 contributes to the neoplastic development by preventing normal turnover due to programmed cell death . Additionally, the regulatory miRNA of BCL2, miRNA-342, has been found to induce apoptosis of the colorectal cancer cells and function as a proapoptotic tumor suppressor . Thus, we speculate that miRNA-342 may affect the tumorigenesis of TPC through regulating BCL2 expression.
Among the 11 prognostic risk genes, RARA encodes a TF and regulates several downstream genes. In addition, RARA was found to enrich in 4 pathways, such as pathways of RXR and RAR heterodimerization with other nuclear receptor and RARs-mediated signaling.RARs and RXRs are members of the intracellular receptor superfamily that bind to their endogenous retinoid ligands with high specificity and affinity . Retinoids exert functions of inducing differentiation and arresting proliferation in every cell and organ system virtually, besides, the retinoid signaling is often compromised early in carcinogenesis . Importantly, nuclear RARs mediate most of the effects of retinoids. There are three distinct subtypes of RARs and RXRs, classified as α (RARA/RXRA), β (RARB/RARB), γ (RARG/RXRG) [42-44]. Among these subtypes, RARA is implicated in regulation of differentiation, apoptosis and transcription of clock genes . RARB and RXRG have been suggested to be differentially expressed in thyroid cancer cell lines and tumor tissue . Therefore, we speculate that RARA may have utility in the diagnose and treatment of TPC.
In addition to RARA, MAFF was also identified to be a prognostic risk gene in our study. Besides, its encoding protein is also a TF which regulated oxytocin receptor (OXTR) in the transcriptional regulatory network. OXTR belongs to the G-protein coupled receptor family and acts as a receptor for oxytocin . OXT mainly functions in reproduction . Recently, OXT is found to regulate cell proliferation, as well as the diffuse expression of OXTR in neoplastic tissue, thus, playing a regulatory role in the tumor growth . Cassoni et al. found the presence of OXTR in breast cancer cell lines . Additionally, Zhong et al. reported that OXTR was detected in tissues from hyperplastic and neoplastic prostate . Thus, we speculate that MAFF may play an important role in the progression of TPC through regulating the function of OXTR.
In summary, our data provide a comprehensive bioinformatics analysis of prognostic risk factors in TPC, which may give new insights into the understanding of the mechanism of this disease. RARA, MAFF, miRNA-93 and its target gene SOX4, as well as miRNA-342 and its target gene BCL2 are associated with the progression of TPC and may serve as prognostic markers and potential therapeutic target in this disease. However, further experimental studies is still needed to confirm our observation.