Novel analytical methods to interpret large sequencing data from small sample sizes

Hum Genomics. 2019 Aug 30;13(1):41. doi: 10.1186/s40246-019-0235-1.

Abstract

Background: Targeted therapies have greatly improved cancer patient prognosis. For instance, chronic myeloid leukemia is now well treated with imatinib, a tyrosine kinase inhibitor. Around 80% of the patients reach complete remission. However, despite its great efficiency, some patients are resistant to the drug. This heterogeneity in the response might be associated with pharmacokinetic parameters, varying between individuals because of genetic variants. To assess this issue, next-generation sequencing of large panels of genes can be performed from patient samples. However, the common problem in pharmacogenetic studies is the availability of samples, often limited. In the end, large sequencing data are obtained from small sample sizes; therefore, classical statistical analyses cannot be applied to identify interesting targets. To overcome this concern, here, we described original and underused statistical methods to analyze large sequencing data from a restricted number of samples.

Results: To evaluate the relevance of our method, 48 genes involved in pharmacokinetics were sequenced by next-generation sequencing from 24 chronic myeloid leukemia patients, either sensitive or resistant to imatinib treatment. Using a graphical representation, from 708 identified polymorphisms, a reduced list of 115 candidates was obtained. Then, by analyzing each gene and the distribution of variant alleles, several candidates were highlighted such as UGT1A9, PTPN22, and ERCC5. These genes were already associated with the transport, the metabolism, and even the sensitivity to imatinib in previous studies.

Conclusions: These relevant tests are great alternatives to inferential statistics not applicable to next-generation sequencing experiments performed on small sample sizes. These approaches permit to reduce the number of targets and find good candidates for further treatment sensitivity studies.

Keywords: Chronic myeloid leukemia; Factorial correspondence analysis; Hierarchical clustering on principal components; Next-generation sequencing; Pharmacogenetics; Rank products; Small sample size; Statistics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Alleles
  • DNA-Binding Proteins / genetics*
  • Drug Resistance, Neoplasm / genetics
  • Endonucleases / genetics*
  • Female
  • Glucuronosyltransferase / genetics*
  • Humans
  • Imatinib Mesylate / administration & dosage
  • Imatinib Mesylate / adverse effects
  • Leukemia, Myelogenous, Chronic, BCR-ABL Positive / drug therapy*
  • Leukemia, Myelogenous, Chronic, BCR-ABL Positive / genetics
  • Leukemia, Myelogenous, Chronic, BCR-ABL Positive / pathology
  • Male
  • Middle Aged
  • Mutation / genetics
  • Nuclear Proteins / genetics*
  • Pharmacogenomic Variants / genetics
  • Prognosis
  • Protein Kinase Inhibitors / administration & dosage
  • Protein Kinase Inhibitors / adverse effects
  • Protein Tyrosine Phosphatase, Non-Receptor Type 22 / genetics*
  • Sample Size
  • Transcription Factors / genetics*
  • UDP-Glucuronosyltransferase 1A9
  • Young Adult

Substances

  • DNA excision repair protein ERCC-5
  • DNA-Binding Proteins
  • Nuclear Proteins
  • Protein Kinase Inhibitors
  • Transcription Factors
  • UGT1A9 protein, human
  • Imatinib Mesylate
  • Glucuronosyltransferase
  • UDP-Glucuronosyltransferase 1A9
  • Endonucleases
  • PTPN22 protein, human
  • Protein Tyrosine Phosphatase, Non-Receptor Type 22