Regression analysis of combined gene expression regulation in acute myeloid leukemia

PLoS Comput Biol. 2014 Oct 23;10(10):e1003908. doi: 10.1371/journal.pcbi.1003908. eCollection 2014 Oct.

Abstract

Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Computational Biology / methods*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic / genetics*
  • Gene Regulatory Networks
  • Humans
  • Kaplan-Meier Estimate
  • Leukemia, Myeloid, Acute / genetics*
  • Leukemia, Myeloid, Acute / metabolism*
  • Male
  • MicroRNAs / genetics
  • MicroRNAs / metabolism
  • Regression Analysis

Substances

  • MicroRNAs

Grants and funding

YL is funded by Natural Sciences and Engineering Research Council (NSERC) Canada Graduate Scholarship, and ZZ is supported by Ontario Research Fund - Global Leader (Round 2); and Natural Sciences and Engineering Research Council (NSERC) grant [grant number 327612]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.