Identification of common prognostic gene expression signatures with biological meanings from microarray gene expression datasets

PLoS One. 2012;7(9):e45894. doi: 10.1371/journal.pone.0045894. Epub 2012 Sep 21.

Abstract

Numerous prognostic gene expression signatures for breast cancer were generated previously with few overlap and limited insight into the biology of the disease. Here we introduce a novel algorithm named SCoR (Survival analysis using Cox proportional hazard regression and Random resampling) to apply random resampling and clustering methods in identifying gene features correlated with time to event data. This is shown to reduce overfitting noises involved in microarray data analysis and discover functional gene sets linked to patient survival. SCoR independently identified a common poor prognostic signature composed of cell proliferation genes from six out of eight breast cancer datasets. Furthermore, a sequential SCoR analysis on highly proliferative breast cancers repeatedly identified T/B cell markers as favorable prognosis factors. In glioblastoma, SCoR identified a common good prognostic signature of chromosome 10 genes from two gene expression datasets (TCGA and REMBRANDT), recapitulating the fact that loss of one copy of chromosome 10 (which harbors the tumor suppressor PTEN) is linked to poor survival in glioblastoma patients. SCoR also identified prognostic genes on sex chromosomes in lung adenocarcinomas, suggesting patient gender might be used to predict outcome in this disease. These results demonstrate the power of SCoR to identify common and biologically meaningful prognostic gene expression signatures.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma / genetics
  • Adenocarcinoma / metabolism*
  • Adenocarcinoma / mortality
  • Adenocarcinoma of Lung
  • Algorithms
  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / metabolism*
  • Breast Neoplasms / mortality
  • Cluster Analysis
  • Female
  • Gene Expression Profiling
  • Glioblastoma / genetics
  • Glioblastoma / metabolism*
  • Humans
  • Kaplan-Meier Estimate
  • Lung Neoplasms / genetics
  • Lung Neoplasms / metabolism*
  • Lung Neoplasms / mortality
  • Male
  • Models, Biological
  • Oligonucleotide Array Sequence Analysis
  • Prognosis
  • Proportional Hazards Models
  • Sex Factors
  • Transcriptome*

Substances

  • Biomarkers, Tumor