Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer

Mutat Res. 2015 Oct:780:24-34. doi: 10.1016/j.mrfmmm.2015.07.005. Epub 2015 Jul 20.

Abstract

Cancer is one of the most life-threatening diseases and mutations in several genes are the vital cause in tumorigenesis. Protein kinases play essential roles in cancer progression and specifically, epidermal growth factor receptor (EGFR) is an important target for cancer therapy. In this work, we have developed a method to classify single amino acid polymorphisms (SAPs) in EGFR into disease-causing (driver) and neutral (passenger) mutations using both sequence and structure based features of the mutation site by machine learning approaches. We compiled a set of 222 features and selected a set of 21 properties utilizing feature selection methods, for maximizing the prediction performance. In a set of 540 mutants, we obtained an overall classification accuracy of 67.8% with 10 fold cross validation using support vector machines. Further, the mutations have been grouped into four sets based on secondary structure and accessible surface area, which enhanced the overall classification accuracy to 80.2%, 81.9%, 77.9% and 75.1% for helix, strand, coil-buried and coil-exposed mutants, respectively. The method was tested with a blind dataset of 60 mutations, which showed an average accuracy of 85.4%. These accuracy levels are superior to other methods available in the literature for EGFR mutants, with an increase of more than 30%. Moreover, we have screened all possible single amino acid polymorphisms (SAPs) in EGFR and suggested the probable driver and passenger mutations, which would help in the development of mutation specific drugs for cancer treatment.

Keywords: Driver mutation; EGFR; Machine learning; Passenger mutation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • ErbB Receptors / genetics*
  • Humans
  • Models, Genetic*
  • Mutation, Missense*
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide*
  • Support Vector Machine*

Substances

  • EGFR protein, human
  • ErbB Receptors