Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases

J Proteome Res. 2015 Dec 4;14(12):5028-37. doi: 10.1021/acs.jproteome.5b00472. Epub 2015 Nov 16.

Abstract

The goal of the Chromosome-Centric Human Proteome Project (C-HPP) is to fully provide proteomic information from each human chromosome, including novel proteoforms, such as novel protein-coding variants expressed from noncoding genomic regions, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS raw files from human hippocampal tissues of control, epilepsy, and Alzheimer's disease, we identified the novel proteoforms with a workflow including integrated proteomic pipeline using three different search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery rate (FDR) at the protein level, the 11 detected peptides mapped to four translated long noncoding RNA variants against the customized databases of GENCODE lncRNA, which also mapped to coding-proteins at different chromosomal sites. We also identified four novel ASVs against the customized databases of GENCODE transcript. The target peptides from the variants were validated by tandem MS fragmentation pattern from their corresponding synthetic peptides. Additionally, a total of 128 SAAVs paired with their wild-type peptides were identified with FDR <1% at the peptide level using a customized database from neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP) information. Among these results, several novel variants related in neuro-degenerative disease were identified using the workflow that could be applicable to C-HPP studies. All raw files used in this study were deposited in ProteomeXchange (PXD000395).

Keywords: C-HPP; alternative splice variants; hippocampus; novel proteoforms; single amino acid variants; translated lncRNA variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing
  • Alzheimer Disease / genetics
  • Alzheimer Disease / metabolism*
  • Amino Acid Sequence
  • Case-Control Studies
  • Chromatography, Liquid
  • Chromosomes, Human
  • Databases, Genetic
  • Databases, Protein
  • Epilepsy / genetics
  • Epilepsy / metabolism*
  • Genetic Variation
  • Hippocampus / metabolism*
  • Hippocampus / physiology
  • Humans
  • Molecular Sequence Data
  • Polymorphism, Single Nucleotide
  • Proteomics / methods*
  • Software
  • Tandem Mass Spectrometry
  • Workflow