MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes

BMC Bioinformatics. 2005 Mar 30:6:79. doi: 10.1186/1471-2105-6-79.

Abstract

Background: Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.

Results: We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.

Conclusion: The search engine, available at http://mapper.chip.org, allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Binding Sites
  • Cell Cycle Proteins / genetics
  • Computational Biology / methods*
  • Computers
  • Databases, Genetic
  • Databases, Nucleic Acid
  • Databases, Protein
  • Evolution, Molecular
  • Gene Expression Regulation
  • Gene Library
  • Genome*
  • Humans
  • Internet
  • Markov Chains
  • Programming Languages
  • Protein Binding
  • Sequence Alignment
  • Software
  • Transcription Factors / chemistry*
  • Transcription Factors / metabolism
  • Transcription, Genetic
  • User-Computer Interface

Substances

  • Cell Cycle Proteins
  • MCM5 protein, human
  • Transcription Factors