Accurate prediction of a minimal region around a genetic association signal that contains the causal variant

Eur J Hum Genet. 2014 Feb;22(2):238-42. doi: 10.1038/ejhg.2013.115. Epub 2013 Jun 5.

Abstract

In recent years, genome-wide association studies have been very successful in identifying loci for complex traits. However, typically these findings involve noncoding and/or intergenic SNPs without a clear functional effect that do not directly point to a gene. Hence, the challenge is to identify the causal variant responsible for the association signal. Typically, the first step is to identify all genetic variation in the locus region, usually by resequencing a large number of case chromosomes. Among all variants, the causal one needs to be identified in further functional studies. Because the experimental follow up can be very laborious, restricting the number of variants to be scrutinized can yield a great advantage. An objective method for choosing the size of the region to be followed up would be highly valuable. Here, we propose a simple method to call the minimal region around a significant association peak that is very likely to contain the causal variant. We model linkage disequilibrium (LD) in cases from the observed single SNP association signals, and predict the location of the causal variant by quantifying how well this relationship fits the data. Simulations showed that our approach identifies genomic regions of on average ∼50 kb with up to 90% probability to contain the causal variant. We apply our method to two genome-wide association data sets and localize both the functional variant REP1 in the α-synuclein gene that conveys susceptibility to Parkinson's disease and the APOE gene responsible for the association signal in the Alzheimer's disease data set.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Apolipoproteins E / genetics
  • Gene Frequency
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study / methods*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic*
  • Parkinson Disease / genetics
  • Polymorphism, Single Nucleotide
  • alpha-Synuclein / genetics

Substances

  • Apolipoproteins E
  • SNCA protein, human
  • alpha-Synuclein