Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition

Sci Rep. 2015 Nov 26:5:17155. doi: 10.1038/srep17155.

Abstract

Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts.

MeSH terms

  • Base Composition*
  • Coronavirus / genetics*
  • Coronavirus Infections / virology*
  • Host-Pathogen Interactions*
  • Humans
  • Models, Statistical*
  • Nucleotides / genetics*
  • Reproducibility of Results
  • Support Vector Machine

Substances

  • Nucleotides