Supervised Bayesian latent class models for high-dimensional data

Stat Med. 2012 Jun 15;31(13):1342-60. doi: 10.1002/sim.4448. Epub 2012 Apr 11.

Abstract

High-grade gliomas are the most common primary brain tumors in adults and are typically diagnosed using histopathology. However, these diagnostic categories are highly heterogeneous and do not always correlate well with survival. In an attempt to refine these diagnoses, we make several immunohistochemical measurements of YKL-40, a gene previously shown to be differentially expressed between diagnostic groups. We propose two latent class models for classification and variable selection in the presence of high-dimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques. Penalization and model selection are incorporated in this setting via prior distributions on the unknown parameters. The methods provide valid parameter estimates under conditions in which standard supervised latent class models do not, and outperform two-stage approaches to variable selection and parameter estimation in a variety of settings. We study the properties of these methods in simulations, and apply these methodologies to the glioma study for which identifiable three-class parameter estimates cannot be obtained without penalization. With penalization, the resulting latent classes correlate well with clinical tumor grade and offer additional information on survival prognosis that is not captured by clinical diagnosis alone. The inclusion of YKL-40 features also increases the precision of survival estimates. Fitting models with and without YKL-40 highlights a subgroup of patients who have glioblastoma (GBM) diagnosis but appear to have better prognosis than the typical GBM patient.

MeSH terms

  • Adipokines / genetics
  • Adipokines / metabolism*
  • Aged
  • Bayes Theorem*
  • Brain Neoplasms / metabolism
  • Brain Neoplasms / mortality*
  • Brain Neoplasms / pathology
  • Chitinase-3-Like Protein 1
  • Computer Simulation / statistics & numerical data
  • Glioma / metabolism
  • Glioma / mortality*
  • Glioma / pathology
  • Humans
  • Lectins / genetics
  • Lectins / metabolism*
  • Markov Chains
  • Models, Statistical*
  • Monte Carlo Method
  • Neoplasm Grading
  • Survival Analysis

Substances

  • Adipokines
  • CHI3L1 protein, human
  • Chitinase-3-Like Protein 1
  • Lectins