AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects

PLoS Comput Biol. 2014 Jun 5;10(6):e1003627. doi: 10.1371/journal.pcbi.1003627. eCollection 2014 Jun.

Abstract

Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bipolar Disorder / genetics
  • Complement Factor H / genetics
  • Computational Biology
  • Computer Simulation
  • Data Mining / statistics & numerical data
  • Databases, Genetic
  • Epistasis, Genetic*
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Linkage Disequilibrium
  • Logistic Models
  • Macular Degeneration / genetics
  • Models, Genetic

Substances

  • Complement Factor H

Grants and funding

This work was supported by National Natural Science Foundation of China (NSFC). http://www.nsfc.gov.cn/e_nsfc/desktop/zn/0104.htm Project number 30730057 (JO) and 30700442 (QZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.