TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors

PLoS One. 2013 Dec 12;8(12):e82238. doi: 10.1371/journal.pone.0082238. eCollection 2013.

Abstract

One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology / methods*
  • Nucleotide Motifs
  • Protein Structure, Tertiary
  • Software*
  • Transcription Factors / chemistry*
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors

Grants and funding

The Federal Ministry of Education and Research (BMBF), Germany funded this work in the MedSys project Spher4Sys, (grant number 0315384C), the Virtual Liver Network (grant number 0315756) and a Marie Curie International Outgoing Fellowship within the EU 7th Framework Program for Research and Technological Development (project AMBiCon, 332020) to AD. Further financial support was provided by the German Research Foundation and the Open Access Publishing Fund of Tuebingen. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.