Kullback-Leibler divergence for detection of rare haplotype common disease association

Eur J Hum Genet. 2015 Nov;23(11):1558-65. doi: 10.1038/ejhg.2015.25. Epub 2015 Mar 4.

Abstract

Rare haplotypes may tag rare causal variants of common diseases; hence, detection of such rare haplotypes may also contribute to our understanding of complex disease etiology. Because rare haplotypes frequently result from common single-nucleotide polymorphisms (SNPs), focusing on rare haplotypes is much more economical compared with using rare single-nucleotide variants (SNVs) from sequencing, as SNPs are available and 'free' from already amassed genome-wide studies. Further, associated haplotypes may shed light on the underlying disease causal mechanism, a feat unmatched by SNV-based collapsing methods. In recent years, data mining approaches have been adapted to detect rare haplotype association. However, as they rely on an assumed underlying disease model and require the specification of a null haplotype, results can be erroneous if such assumptions are violated. In this paper, we present a haplotype association method based on Kullback-Leibler divergence (hapKL) for case-control samples. The idea is to compare haplotype frequencies for the cases versus the controls by computing symmetrical divergence measures. An important property of such measures is that both the frequencies and logarithms of the frequencies contribute in parallel, thus balancing the contributions from rare and common, and accommodating both deleterious and protective, haplotypes. A simulation study under various scenarios shows that hapKL has well-controlled type I error rates and good power compared with existing data mining methods. Application of hapKL to age-related macular degeneration (AMD) shows a strong association of the complement factor H (CFH) gene with AMD, identifying several individual rare haplotypes with strong signals.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Complement Factor H / genetics
  • Computer Simulation
  • Genetic Association Studies*
  • Genetic Predisposition to Disease*
  • Haplotypes / genetics*
  • Humans
  • Linkage Disequilibrium
  • Macular Degeneration / genetics
  • Macular Degeneration / pathology
  • Polymorphism, Single Nucleotide
  • Rare Diseases / genetics*

Substances

  • Complement Factor H