Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection

Proc Natl Acad Sci U S A. 2001 Jan 2;98(1):31-6. doi: 10.1073/pnas.98.1.31.

Abstract

Recent advances in cDNA and oligonucleotide DNA arrays have made it possible to measure the abundance of mRNA transcripts for many genes simultaneously. The analysis of such experiments is nontrivial because of large data size and many levels of variation introduced at different stages of the experiments. The analysis is further complicated by the large differences that may exist among different probes used to interrogate the same gene. However, an attractive feature of high-density oligonucleotide arrays such as those produced by photolithography and inkjet technology is the standardization of chip manufacturing and hybridization process. As a result, probe-specific biases, although significant, are highly reproducible and predictable, and their adverse effect can be reduced by proper modeling and analysis methods. Here, we propose a statistical model for the probe-level data, and develop model-based estimates for gene expression indexes. We also present model-based methods for identifying and handling cross-hybridizing probes and contaminating array regions. Applications of these results will be presented elsewhere.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Artifacts
  • DNA Probes
  • Gene Expression Profiling / methods*
  • Humans
  • Mice
  • Models, Statistical*
  • Nucleic Acid Hybridization
  • Oligonucleotide Array Sequence Analysis / methods*
  • RNA, Messenger / analysis
  • RNA, Messenger / genetics
  • Reference Standards
  • Research Design

Substances

  • DNA Probes
  • RNA, Messenger