An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles

Genome Res. 2001 Jul;11(7):1227-36. doi: 10.1101/gr.165101.

Abstract

We have developed a statistical regression modeling approach to discover genes that are differentially expressed between two predefined sample groups in DNA microarray experiments. Our model is based on well-defined assumptions, uses rigorous and well-characterized statistical measures, and accounts for the heterogeneity and genomic complexity of the data. In contrast to cluster analysis, which attempts to define groups of genes and/or samples that share common overall expression profiles, our modeling approach uses known sample group membership to focus on expression profiles of individual genes in a sensitive and robust manner. Further, this approach can be used to test statistical hypotheses about gene expression. To demonstrate this methodology, we compared the expression profiles of 11 acute myeloid leukemia (AML) and 27 acute lymphoblastic leukemia (ALL) samples from a previous study (Golub et al. 1999) and found 141 genes differentially expressed between AML and ALL with a 1% significance at the genomic level. Using this modeling approach to compare different sample groups within the AML samples, we identified a group of genes whose expression profiles correlated with that of thrombopoietin and found that genes whose expression associated with AML treatment outcome lie in recurrent chromosomal locations. Our results are compared with those obtained using t-tests or Wilcoxon rank sum statistics.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Acute Disease
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Expression Regulation, Neoplastic / genetics
  • Humans
  • Leukemia, Myeloid / genetics
  • Models, Genetic*
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
  • Regression Analysis