CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations

Database (Oxford). 2012 Mar 20:2012:bas001. doi: 10.1093/database/bas001. Print 2012.

Abstract

The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology
  • Databases, Genetic*
  • Genes, Fungal
  • Genome, Fungal
  • Molecular Sequence Annotation / methods*
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae Proteins / classification
  • Saccharomyces cerevisiae Proteins / genetics
  • Software*
  • Vocabulary, Controlled*

Substances

  • Saccharomyces cerevisiae Proteins