Biological validation of differentially expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide microarrays

J Mol Diagn. 2005 Aug;7(3):337-45. doi: 10.1016/s1525-1578(10)60562-4.

Abstract

Oligonucleotide microarrays are a powerful tool for profiling the expression levels of thousands of genes. Different statistical methods for identifying differentially expressed genes can yield different results. To our knowledge, no experimental test has been performed to decide which method best identifies genes that are truly differentially expressed. We applied three statistical methods (dChip, t-test on log-transformed data, and Wilcoxon test) to identify differentially expressed genes in previously untreated patients with chronic lymphocytic leukemia (CLL). We used a training set of Affymetrix Hu133A microarray data from 11 patients with unmutated immunoglobulin (Ig) heavy chain variable region (VH) genes and 8 patients with mutated Ig VH genes. Differential expression was validated using semiquantitative real-time polymerase chain reaction assays and by validating models to predict the somatic mutation status of an independent test set of nine CLL samples. The methods identified 144 genes that were differentially expressed between cases of CLL with unmutated compared with mutated Ig VH genes. Eighty genes were identified by Wilcoxon test, 60 by t-test, and 65 by dChip, but only 11 were identified by all three methods. Greater agreement was found between the t-test and the Wilcoxon test. Differential expression was validated by semiquantitative real-time polymerase chain reaction assays for 83% of individual genes, regardless of the statistical method. However, the Wilcoxon test gave the most accurate predictions on new samples, and dChip, the least accurate. We found that all three methods were equally good for finding differentially expressed genes, but they found different genes. The genes selected by the nonparametric Wilcoxon test are the most robust for predicting the status of new cases. A comprehensive list of all differentially expressed genes can only be obtained by combining the results of multiple statistical tests.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Gene Expression Profiling
  • Gene Expression Regulation, Leukemic*
  • Humans
  • Immunoglobulin Heavy Chains / genetics*
  • Immunoglobulin Variable Region / genetics*
  • Leukemia, Lymphocytic, Chronic, B-Cell / genetics*
  • Leukemia, Lymphocytic, Chronic, B-Cell / metabolism
  • Models, Genetic
  • Models, Statistical
  • Neoplasm Proteins / genetics*
  • Oligonucleotide Array Sequence Analysis*
  • Reverse Transcriptase Polymerase Chain Reaction
  • Somatic Hypermutation, Immunoglobulin

Substances

  • Immunoglobulin Heavy Chains
  • Immunoglobulin Variable Region
  • Neoplasm Proteins