Dissecting systems-wide data using mixture models: application to identify affected cellular processes

BMC Bioinformatics. 2005 Jul 14:6:177. doi: 10.1186/1471-2105-6-177.

Abstract

Background: Functional analysis of data from genome-scale experiments, such as microarrays, requires an extensive selection of differentially expressed genes. Under many conditions, the proportion of differentially expressed genes is considerable, making the selection criteria a balance between the inclusion of false positives and the exclusion of false negatives.

Results: We developed an analytical method to determine a p-value threshold from a microarray experiment that is dependent on the quality and design of the data set. To this aim, populations of p-values are modeled as mathematical functions in which the parameters to describe these functions are estimated in an unsupervised manner. The strength of the method is exemplified by its application to a published gene expression data set of sporadic and familial breast tumors with BRCA1 or BRCA2 mutations.

Conclusion: We present an objective and unsupervised way to set thresholds adapted to the quality and design of the experiment. The resulting mathematical description of the data sets of genome-scale experiments enables a probabilistic approach in systems biology.

MeSH terms

  • Breast Neoplasms / genetics
  • Cell Cycle / genetics
  • Computational Biology / methods*
  • DNA-Binding Proteins / metabolism
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic / genetics
  • Genetic Testing / methods
  • Humans
  • Models, Genetic*
  • Phosphorylation
  • Predictive Value of Tests
  • Protein Array Analysis / methods

Substances

  • DNA-Binding Proteins