Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature

PLoS One. 2016 Jan 5;11(1):e0146300. doi: 10.1371/journal.pone.0146300. eCollection 2016.

Abstract

The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biomedical Research / methods
  • Biomedical Research / statistics & numerical data
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Databases, Factual
  • Humans
  • Information Storage and Retrieval / methods*
  • Information Storage and Retrieval / statistics & numerical data
  • Internet*
  • Neurosciences / methods*
  • Neurosciences / statistics & numerical data
  • Publications / statistics & numerical data
  • Registries / statistics & numerical data
  • Reproducibility of Results
  • Software*