Unified tests for fine-scale mapping and identifying sparse high-dimensional sequence associations

Bioinformatics. 2016 Feb 1;32(3):330-7. doi: 10.1093/bioinformatics/btv586. Epub 2015 Oct 12.

Abstract

Motivation: In searching for genetic variants for complex diseases with deep sequencing data, genomic marker sets of high-dimensional genotypic data and sparse functional variants are quite common. Existing sequence association tests are incapable of identifying such marker sets or individual causal loci, although they appeared powerful to identify small marker sets with dense functional variants. In sequence association studies of admixed individuals, cryptic relatedness and population structure are known to confound the association analyses.

Method: We here propose a unified marker wise test (uFineMap) to accurately localize causal loci and a unified high-dimensional set based test (uHDSet) to identify high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals with random relatedness. These two novel tests are based on scaled sparse linear mixed regressions with Lp (0 < p < 1) norm regularization. They jointly adjust for cryptic relatedness, population structure and other confounders to prevent false discoveries and improve statistical power for identifying promising individual markers and marker sets that harbor functional genetic variants of a complex trait.

Results: With large scale simulation data and real data analyses, the proposed tests appropriately controlled Type I error rates and appeared to be more powerful than several prominent methods. We illustrated their practical utilities by the applications to DNA sequence data of Framingham Heart Study for osteoporosis. The proposed tests identified 11 novel significant genes that were missed by the prominent famSKAT and GEMMA. In particular, four out of six most significant pathways identified by the uHDSet but missed by famSKAT have been reported to be related to BMD or osteoporosis in the literature.

Availability and implementation: The computational toolkit is available for academic use: https://sites.google.com/site/shaolongscode/home/uhdset

Contact: wyp@tulane.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Chromosome Mapping
  • Genetic Variation*
  • Genome-Wide Association Study*
  • Genomics / methods
  • Genotyping Techniques
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Linear Models
  • Osteoporosis / genetics
  • Phenotype
  • Sequence Analysis, DNA / methods*