A non-parametric method for building predictive genetic tests on high-dimensional data

Hum Hered. 2011;71(3):161-70. doi: 10.1159/000327299. Epub 2011 Jul 20.

Abstract

Objective: Predictive tests that capitalize on emerging genetic findings hold great promise for enhanced personalized healthcare. With the emergence of a large amount of data from genome-wide association studies (GWAS), interest has shifted towards high-dimensional risk prediction.

Methods: To form predictive genetic tests on high-dimensional data, we propose a non-parametric method, called the 'forward ROC method'. The method adopts a computationally efficient algorithm to search for environment risk factors, genetic predictors on the entire genome, and their possible interactions for an optimal risk prediction model, without relying on prior knowledge of known risk factors. An efficient yet powerful procedure is also incorporated into the method to handle missing data.

Results: Through simulations and real data applications, we found our proposed method outperformed the existing approaches. We applied the new method to the Wellcome Trust rheumatoid arthritis GWAS dataset with a total of 460,547 markers. The results from the risk prediction analysis suggested important roles of HLA-DRB1 and PTPN22 in predicting rheumatoid arthritis.

Conclusion: We proposed a powerful and robust approach for high-dimensional risk prediction. The new method will facilitate future risk prediction that considers a large number of predictors and their interaction for improved performance.

MeSH terms

  • Arthritis, Rheumatoid / genetics
  • Computer Simulation
  • Databases, Genetic / statistics & numerical data*
  • Genetic Predisposition to Disease
  • Genetic Testing / methods*
  • Genetic Testing / statistics & numerical data*
  • Genome-Wide Association Study
  • Humans
  • Predictive Value of Tests
  • ROC Curve
  • Reproducibility of Results
  • Risk Factors
  • Statistics, Nonparametric