Tuesday, 14 November 2017

Harness the power of text-mining for biomedical discovery: introducing Europe PMC Annotations API

We are excited to announce the launch of Europe PMC Annotations API, which provides programmatic access to annotations text-mined from biomedical abstracts and open access full text articles. The Annotations API is a part of Europe PMC’s programmatic tools suit and is freely available on the Europe PMC website: https://europepmc.org/AnnotationsApi.

The exponential growth in scientific data and scholarly content cannot be addressed by conventional means of information discovery. Text-mining offers a practical solution to scale information extraction and advance biomedical research. However its application is still limited, partially due to the technical know-how needed to set up a text-mining pipeline. Nonetheless, even non-specialists can capitalize on the text-mining outcomes. Making the text-mining outputs openly available can enable a broad audience of researchers and developers to address current challenges in biomedical literature analysis. For that reason, Europe PMC has established a
community annotation platform. It consolidates text-mined annotations from various providers and makes them available both via the Europe PMC website as text highlights using the SciLite application, and now programmatically, with the Europe PMC Annotations API.

What types of annotations are available via the API?

The text-mining platform hosts a variety of annotation types, including the following:
  • Named entities: gene/protein names, organisms, diseases, Gene Ontology terms, chemicals and accession numbers.
  • Biological events: protein phosphorylation
  • Relationships: gene-disease associations
  • Text phrases: GeneRIF (Gene Reference into Function), as well as protein-protein interactions.

The number and diversity of annotation types available to users grows as the new providers join the scheme. Currently annotations are contributed by a number of providers, including Europe PMC’s own text-mining pipeline for named entities, the Swiss Institute of Bioinformatics for gene function annotations, DisGeNET and the Open Targets Platform for gene-disease relations, IntAct for manually curated protein-protein interactions, and NaCTEM for phosphorylation events.

What can you do using the Annotations API?

The Annotations API offers diverse functionality for various user workflows. It is possible to retrieve all mentions of chemical entities found in results section, or recall all articles that discuss involvement of FFA2 protein in diabetes. Users can combine a number of parameters (such as annotation type, annotation provider, or even article section) to specify their query. The filter parameter allows the user to switch between two options: retrieving only specified annotations for each article, or retrieving all the annotations limited to the list of articles that contain the specified annotation. Output formats include XML, ID_LIST, JSON and JSON-LD, which produces a linked data representation of the annotations for annotations exchange across different platforms, making it easier to consume the data.

The following modules are available for the Annotations API:
  • get /annotationsByArticleIds: Get the annotations contained in the list of articles specified by the user. The user can specify article iDs, annotation type, annotation provider, or article section.
  • get /annotationsByEntity: Get the annotations of the articles which have at least one annotation tagging the specified entity (e.g. BRCA1, metformin, or cancer).
  • get /annotationsByProvider: Get the annotations of the articles which have at least one annotation provided by the specified provider (e.g. Open Targets).
  • get /annotationsByRelationship: Get the annotations of the articles which have at least one annotation tagging two specified entities relating to each other (i.e. NRGN-schizophrenia for gene-disease relationship).
  • get /annotationsBySectionAndOrType: Get the annotations of the articles which have at least one annotation of a type inside an article section (e.g. all organisms annotations in “Materials and Methods” section). The user can also choose to specify only the annotation type or only the article section.

Advancing science with text-mining

Systematic analysis of research literature offers immense potential for advancing scientific knowledge. However, extracting facts and evidence is a difficult, time-consuming and laborious task. That’s where text-mining comes into play, helping to advance the discovery process. Our goal in developing the Annotations API was to make text-mining outputs available to a wider community of biomedical scientists, and we hope that with this new tool anyone will be able to harness the power of text-mining for the benefit of their own research.