Logistic Bayesian LASSO for identifying association with rare haplotypes and application to age-related macular degeneration

Biometrics. 2012 Jun;68(2):587-97. doi: 10.1111/j.1541-0420.2011.01680.x. Epub 2011 Sep 28.

Abstract

Rare variants have been heralded as key to uncovering "missing heritability" in complex diseases. These variants can now be genotyped using next-generation sequencing technologies; nonetheless, rare haplotypes may also result from combination of common single nucleotide polymorphisms available from genome-wide association studies (GWAS). The National Eye Institute's data on age-related macular degeneration (AMD) is such an example. Studies on AMD had identified potential rare variants; however, due to lack of appropriate statistical tools, effects of individual rare haplotypes were never studied. Here we develop a method for identifying association with rare haplotypes for case-control design. A logistic regression based retrospective likelihood is formulated and is regularized using logistic Bayesian LASSO (LBL). In particular, we penalize the regression coefficients using appropriate priors to weed out unassociated haplotypes, making it possible for the rare associated ones to stand out. We applied LBL to the AMD data and identified common and rare haplotypes in the complement factor H gene, gaining insights into rare variants' contributions to AMD beyond the current literature. This analysis also demonstrates the richness of GWAS data for mapping rare haplotypes-a potential largely unexplored. Additionally, we conducted simulations to investigate the performance of LBL and compare it with Hapassoc. Our results show that LBL is much more powerful in identifying rare associated haplotypes when the false positive rates for both approaches are kept the same.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Bayes Theorem
  • Biometry / methods*
  • Complement Factor H / genetics
  • Computer Simulation
  • Databases, Genetic / statistics & numerical data
  • Genome-Wide Association Study
  • Haplotypes
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Macular Degeneration / genetics*
  • Markov Chains
  • Models, Genetic
  • Monte Carlo Method
  • Polymorphism, Single Nucleotide

Substances

  • Complement Factor H