Gene Expression Analysis in Acute Lymphoblastic Leukaemia

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

  1. Oncomine Microarray gene expression analysis, Gene Prioritization and Network analysis

Candidate gene selection through Expression Data analysis:

Oncomine microarray expression database, a comprehensive expression database with considerabale statistical validation, was analysed to shortlist genes that were significantly overexpressed in B- and T-ALL. The dabase was found to contain twenty three datasets pertaining to ALL. Of the twenty three datasets, from various expression studies on ALL, that were present in the database, only three datasets from three different studies represented differential analysed data between leukemic and normal cells, with statistical validation performed by Oncomine via t-tests. These datasets included expression data from:

  1. Studies by Maia et al. (2005) on 20 samples representing patients with B-cell Acute Lymphoblastic Leukemia and 8 normal samples, with expression of 12,624 genes measured using Human Genome U133A Affymetrix Array

(ii) Studies by Andersson et al. (2007) on 87 patients with childhood B-cell and 11 childhood T-cell acute leukemia and 6 normal samples, with expression of 10,735 genes measured

(iii)Studies by Haferlach et al. (2010) on 359 B-Cell Childhood Acute Lymphoblastic Leukemia, 70 Pro-B Acute Lymphoblastic Leukemia, 147 B-Cell Acute Lymphoblastic Leukemia, 174 T-Cell Acute Lymphoblastic Leukemia with expression of 19,574 genes measured using Human Genome U133 Plus 2.0 Array. Haferlach et al. (2010) dataset-2 on 220 B-Cell Childhood Acute Lymphoblastic Leukemia, 23 Pro-B Acute Lymphoblastic Leukemia, 114 B-Cell Acute Lymphoblastic Leukemia, 79 T-Cell Acute Lymphoblastic Leukemia, with expression of 910 genes measured

On setting the Oncomine analysis filter to top 10% overexpressed genes, 1072 genes in childhood B-ALL and 1071 genes in childhood T-ALL in Andersson et al. (2010) dataset, 627 genes in Maia et al. (2005) B-ALL dataset, 1957 genes in Childhood B-ALL, 1957 genes in B-ALL, 1957 genes in Pro-B-ALL, 1957 genes in T-ALL in Haferlach et al. (2010) dataset and 91 genes in Childhood B-ALL, 91 genes in B-ALL, 91 genes in Pro-B-ALL, 91 genes in T-ALL in Haferlach et al. (2010) dataset-2 were found to be significantly overexpressed.

For further analysis, the genes overexpressed in B-ALL from all the datasets were compared and presence in two out of three studies was considered to be significant. 237 genes were thus shortlisted for B-ALL. Similarly, for T-ALL, the overexpressed genes in both the datasets were compared and those present in both studies were considererd significant. Thus, 422 genes were shortlisted in T-ALL. The shortlisted B- and T-ALL genes were then combined and duplicates were removed, which led to an ALL overexpressed gene list of 573 genes; of which 530 genes were mappable to ENSEMBL database ( ). These genes, designated as candidates, were used for gene prioritization to identify signature genes that may play a significant role in leukemogenesis.

Gene Prioritization and Enrichment using Training genes:

Gene prioritization was performed using ENDEAVOUR, ToppGene and DIR to identify those genes among the overexpressed candidate genes that are more likely to play a role in disease etiology. These tools performed prioritization based on similarity to disease-specific training genes, which were retrieved for the present analysis via PubMed literature search. The thirty training genes , in Materials and Methods) retrieved were reported by their respective studies to play a significant role in disease etiology and were hence considered sui to train the software to identify potential disease genes from the list of candidate overexpressed genes. Each of the prioritization tools was observed to output a ranked list of candidate prioritized genes, based on their respective algorithms. Of these, the top 100 ranked genes from the results of each tool were considered significant and were compared to shortlist a final set of prioritized genes. Fifty four genes were thus determined to be significant prioritized genes. Analysis of the subtype expression of the 54 genes showed that 30 were overexpressed in T-ALL, 13 in B-ALL and 11 in both T- and B-ALL subtypes ).

Gene prioritization using housekeeping genes as training genes:

The use of ALL specific genes as training genes was validated by performing prioritization using bone marrow expressed housekeeping genes as training genes. A set of bone marrow specific housekeeping genes were retrieved from Chang et al. (2011) in Materials and Methods). Analysis of the prioritization results using ENDEAVOUR, ToppGene and DIR using Oncomine overexpressed genes with housekeeping genes as training genes showed that though some of the genes that were prioritized using ALL specific training genes also occurred in the housekeeping prioritized results as well, their prioritization ranking was vastly different. The results of the prioritization, with a ranked list of 77 shortlisted genes, are presented in .

Gene ontology (GO) analysis

Analysis of the functional enrichment of the gene prioritized using ALL specific training genes using WebGestalt tool showed that these genes were part of several significant biological processes such as Hemopoiesis (adjP=1.91e-08), regulation of cell proliferation (adjP=7.63e-08), regulation of cellular metabolic processes (adjP=1.17e-08), chromatin modification (adjP= 6.72e-08), regulation of transcription (adjP=1.17e-08) and regulation of biosynthetic processes (adjP=3.17e-08) ( ure ).

Further, pathway enrichment of the prioritized genes was investigated via KOBAS server. KEGG Pathway enrichment analysis showed that the 54 prirotized genes were enriched in pathways such as Primary immunodeficiency (corrected P-value = 0.000672), Transcriptional misregulation in cancer (corrected P-value = 0.000672), Adherens junction (corrected P-value = 0.044118), Pathways in cancer (corrected P-value = 0.044118), NF-kappa B signalling pathway (corrected P-value = 0.066325), T cell receptor signalling pathway (corrected P-value = 0.085695), Cell cycle (corrected P-value = 0.125134), Hematopoietic cell lineage (corrected P-value = 0.196626), Notch signalling pathway (corrected P-value = 0.257264), Wnt signalling pathway (corrected P-value = 0.291347), p53 signalling pathway (corrected P-value = 0.291958), etc. ( ).

KEGG pathway analysis of the seventy seven housekeeping prioritized genes showed significant enrichment in: Ribosome (corrected P-value = 0.002727), Spliceosome (corrected P-value = 0.010555), Glycolysis / Gluconeogenesis (corrected P-value = 0.197154), Biosynthesis of amino acids (corrected P-value = 0.220077), Pentose phosphate pathway (corrected P-value = 0.237768), Galactose metabolism (corrected P-value = 0.244104), etc ( ). Thus, the differences in prioritization ranking and enrichment pathways together validate the use of ALL specific training genes in prioritizing leukemic genes and also authenticate the results obtained from the prioritzation in providing insights and identifying ALL related genes. Hence, the 54 prirotized genes were considered signiifcant and were used for further analysis.

The results from the GO and KEGG Pathway enrichment analysis are indicative of pathways in which the genes prioritzed in this study function; thus these pathways and processes attain significance in ALL etiology and represent potential therapeutic targets. For instance, overexpression of proteins that function in adhesion, as evident by the functional enrichment of adherens junction, may lead to increased accumulation of leukemic cells in bone marrow and thus lead to aberrations. Enrichment of genes in Primary immunodeficiency process suggests the inability of B and T cells to function in normal immune response due to loss of differentiation in ALL. The overexpressed genes functioning in NFkB pathway may id in dysregulation of apoptosis pathway, preventing leukemic cell death, thus aiding survival of cancer cells.

Construction of Protein-Protein Interaction and functional association Network:

The 54 prioritized genes and 30 training genes were next used in constructing a protein interaction map in STRING database to assess the interconnectivity between the pathways and processes in which they function and to further identify biomolecules that may associate with these putative disease proteins and abet in their contribution to leukemia etiology. The initial network was grown to obtain a dense network comprising 313 interactors with 2405 interactions (ure , ), after the removal of isolated disconnected proteins. Visual analysis of the network showed that the seed proteins i.e. the prioritzed and training proteins were interspersed throughout the protein interaction network. When the protein interactors were grouped into twelve clusters, through k-Means clustering algorithm, a high degree of interconnectivity was observed within and between the clusters (ure ).

Analysis of the KEGG Pathway enrichment of the interactors through WebGestalt GO tool showed that the network was functionally enriched in Cell cycle (adjP=1.13e-45), apoptosis regulation (adjP=1.12e-42), p53 signalling pathway (adjP=8.22e-41), T-cell (adjP=2.57e-34) and B-cell (adjP=3.01e-20) receptor signalling pathways, MAPK signalling pathway (adjP=8.47e-32), Focal adhesion (adjP=3.33e-29), Wnt (adjP=4.03e-27), ErbB (adjP=5.87e-27), Notch (adjP=5.61e-26), TGF-beta (TGF β) (adjP=2.38e-16) signalling pathways, Hematopoietic cell lineage (adjP=1.23e-14), cytokine-cytokine receptor interaction (adjP=3.99e-10). Cluster 1 was mainly enriched in T cell receptor signaling pathway, cluster 2 in Apoptosis regulators, cluster 3 in Cell cycle regulators, cluster 4 in Wnt signaling pathway proteins, cluster 5 in TGF-beta signaling pathway proteins, clusters 6 and 7 in proteins functioning in Notch signaling pathway, cluster 8 in p53 signaling pathway proteins, cluster 9 in Apoptosis regulators, cluster 10 in Insulin signaling pathway and Focal adhesion proteins, cluster 11 in Cell cycle proteins and cluster 12 in MAPK signaling pathway proteins. Further information about cluster enrichment is available in . Comparison of the enriched pathways between ALL specific training and the 54 prioritized genes revealed that they function in common pathways such as Apoptosis, cell cycle, Notch, Wnt and p53 signalling pathways.

Analysis of the protein interaction network topology via NetworkAnalyser plugin in Cytoscape software showed that the network constructed displayed the topology of a scale free small world biological network with degree exponent, γ= 0.923, fitness of data points to curve parameter, R2 = 0.684, following the node degree distribution power law, P(k)~ k-γ. The clustering coefficient was estimated by the software to be 0.434.

Identification of hub proteins:

The hub proteins in the network were identified using cytohubba plugin in Cytoscape software, via application of degree and betweenness ranking methods. The fifty proteins shortlisted by each of the ranking methods were compared. The proteins from the prioritized list, that were common to both the ranking methods and hence considered significant, were SMAD2, CDK9, HDAC1, LCK and MEN1. These proteins function normally in cell cycle regulation, cell differentiation and hematopoiesis. Of these 5 proteins, SMAD2 and CDK9 are novel proteins, whose role in ALL has not been reported earlier. The overexpression of these 5 genes in ALL samples, followed by the observed interactions of their proteins in the network, suggests that they may play a role in leukemogenesis.

Histone deacetylase 1 (HDAC1)

HDAC1, a part of the histone deacetylase complex, regulates eukaryotic gene expression. In the PPI network generated, HDAC1 was found to be a part of a cluster comprising many proteins involved in transcription regulation. The upregulation of HDAC1, observed through Oncomine analysis, along with its interactions with proteins such as TAL1, RUNX1 (ure ), suggests that this protein may contribute to disease etiology via promoting the expression of genes that would aid in survival of leukemic T-cells.

Lymphocyte-specific protein tyrosine kinase (LCK)

LCK is an important SRC family tyrosine kinase whose protein product plays a crucial role in the signalling mechanisms involved in selection, maturation and proliferation of T-cells in the thymus. It also plays an important role in T-cell receptor signal transduction pathway. LCK has been reported to protect cells from glucorticoid-induced apoptosis, which is a commonly applied treatment strategy against ALL leukemic cells. The interactors in the LCK associated network cluster are part of T-cell receptor activation which regulates signalling pathways that determine cell survival, proliferation and differentiation. Thus, deregulated expression of LCK along with its interactions with the proteins identified within its cluster, could together associate and contribute to its role in apoptosis resistance, favoring leukemic cell survival in patients with T-ALL undergoing therapy.

Multiple Endocrine Neoplasia I (MEN1)

MEN1 gene functions as a transcriptional regulator and putative tumor suppressor and regulates TGFB1-mediated inhibition of cell proliferation. The MEN1 protein cluster is composed of proteins involved in Wnt and Notch signalling pathways, which play a crucial role in hematopoiesis. Many of the interactors, especially NOTCH1 (Lin et al., 2012), CCND1 (Aref et al., 2006) and LEF1 (Gutierrez et al., 2010) (ure ), have been reported to be altered in leukemic cells, especially T-leukemia cells. Menin protein and its role in MLL mediated leukemogenesis has been reported earlier, through formation of a fusion protein with MLL (Ichikawa et al., 2003; Caslini et al., 2007; Grembecka et al., 2010). Thus, the upregulated expression, along with its interactions with proteins in this cluster such a NOTCH1, LEF1 may contribute to its role in T-cell leukemogenesis and hence this protein may serve as an important target for T-ALL specific therapy.

Mothers Against Decapentaplegic Homolog 2 (SMAD2)

SMAD2 encodes a protein which is an important transcriptional modulator that helps in regulating numerous processes such as cell proliferation, differentiation and apoptosis through TGFβ-mediated signalling pathways along with the other proteins in the SMAD family (Blank and Karlsson, 2011). However, in our study we report for the first time that its expression levels in the T-ALL datasets used for analysis were significantly high. It has also been identified as a prioritized hub gene in the study indicating that this gene could play an important role in the leukemogenic transformation of T-cells through altered TGFβ pathway through its interactions with the other cluster proteins, promoting leukemic cell survival.

Cyclin-dependent Kinase 9 (CDK9)

The cyclin dependent kinase, CDK9, protein is a part of the biggest cluster in the interacting network in this study. The interactors in this cluster function in diverse cellular processes such as DNA repair, genomic stability, chromatin modeling, regulation of transcription, apoptosis regulation, cell proliferation and differentiation. CDK9 is an essential cell cycle regulator as part of a complex formed with the cyclins T and K. It also functions as a transcriptional regulator and promotes recovery of cell cycle from replication arrest after replication stress. The overexpression of CDK9 in B-ALL indicates that this molecule may contribute to neoplastic transformation of B-cells via upregulation of its normal functioning in transcriptional regulation and cell cycle recovery and also via its interactions with the proteins involved in DNA repair, genomic stability, cell proliferation and differentiation, as observed in our protein network (ure ).

Due to the significant deregulated expression of these 5 genes and their interactivity with proteins that are regulate cell proliferation and hematopoiesis, these genes may serve as novel putative biomarkers in disease etiology and also serve as therapeutic targets.