Tuesday, 20 August 2019

Using Europe PMC RESTful APIs

What is FAIR-biomed?

FAIR-biomed is a browser extension that aims to facilitate investigative research in biomedicine. It connects users to information stored in specialized databases. For example, when reading an online report, with simple clicks, FAIR-biomed can help you find all the relevant information on biological entities present in that report. For example, a gene, its description, along with its biological functions and interactors. It also provides links to the sources where this gene was mentioned from the biomedical literature. 
FAIR-biomed retrieves this information from several relevant data sources including Europe PMC, NCBI, ChEMBL, Uniprot and etc. Thus, this browser extension can query entities such as genes, chemicals, pathways, authors, titles of journal articles, and other relevant and useful biomedical terms. FAIR-biomed is available on the Chrome web store and its source code is available on GitHub. This tool was developed by Tomasz Konopka from Queen Mary University of London.

As an example scenario, imagine reading a page describing a work where the development of Plasmodium parasites stopped in female mosquitoes treated with an antimalarial drug. So, if you would like to know what is new in the domain of antimalarial compounds, FAIR-biomed would allow you to access relevant published articles without needing you to switch browser tabs or even searching explicitly in the Google. Just with few clicks, select the text “antimalarial compounds”, open a context menu and select the extension with a right-click, or press Ctrl+Shift+Z on the keyboard. Voila, a pop-up window will appear, displaying all the relevant data resources, including publications from Europe PMC.


How is Europe PMC API integrated in FAIR-biomed?

FAIR-biomed sends a request to the Europe PMC servers via their  RESTful API. The top 8 results including links to the article as well as the full search results are then retrieved from Europe PMC APIs, seamlessly and very easily. Integrating this functionality was straightforward as the API endpoints at Europe PMC have an intuitive structure. The documentation is very well written and concise providing easy-to-use examples with code snippets for integration.


The browser extension is optimised to display a small set of relevant hits and therefore uses some non-default settings such as “resultType=lite” and adjusting the “pageSize=8”. All these optimised settings from Europe PMC, along with the API documentation with concise descriptions for all advanced options made the integration with FAIR-biomed a very easy task.


Try it yourself! This is the Europe PMC API query used for finding relevant information on the “antimalarial compounds”: https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=antimalarial%20compounds&format=json&pageSize=8&resultType=lite 
Technically this query is a combination of strings. The API url string is stored within the extension code, and the user query is inserted in between “search?query=” and “format=json”.
The extension displays the exact API call on the “</>” icon in the bottom toolbar. So the results are reproducible! This is handy if you want to use similar API calls in your own code without reading through documentation pages. 


The extension converts JSON data from the API response into HTML paragraphs. It uses the following fields: “title”, “authorString”, “journalTitle”, “pubYear”, “pubType”. Below, is an example of the output obtained from the Europe PMC API.


By using Europe PMC APIs, FAIR-biomed allows readers to explore biomedical publications such as recent preprints, as well as peer-reviewed articles, and thus provides up-to-date information on the current literature on any biomedical topic.

Tuesday, 9 July 2019

Measuring the value and impact of Europe PMC

The value and impact of Europe PMC were recently evaluated by the independent policy research and consulting organisation, Technopolis, to assess how the platform has been serving the research community. The evaluation was based on document review, surveys, user data and interviews.

Who uses Europe PMC, what for and why?

Europe PMC has a global reach with nearly 12 million unique devices accessing the site in 2018 alone. The users are predominantly researchers, students and health professionals, who mainly use Europe PMC as one of several tools to find scientific publications. In the surveys, 87% of users found the platform useful, the preferred features being ease of use, good coverage in terms of both content and volume, and free access.

What is the economic value of Europe PMC?

Although we know it is positive, it is difficult to put a quantitative value on the benefit of integrating several services in one place. Surveys showed that individual users were willing to pay on average about $28 per year for Europe PMC while librarians proposed to pay an average fee of around $1 per user per year. Monetising the time spent by users on Europe PMC has to account for the fact that many Europe PMC functionalities could be replicated by other services. This redundancy relates to the inherently 'open' nature of much of the data and content that Europe PMC is built on, which can be freely reused by other tools and services. The final calculation resulted in values between $6 and $31 per user per year (estimating between 95% and 75% of Europe PMC that could be replicated elsewhere). While these valuations do not represent a genuine market price and the approaches used may be subject to biases, they certainly suggest that Europe PMC presents significantly higher value to users relative to the moderate running cost, which is equivalent to about $0.33 per user per year. The value thus returned to the user community for every dollar invested in Europe PMC is shown in the figure below.

How does Europe PMC benefit the biomedical research community?

Europe PMC combines several functionalities under one umbrella, such as ORCID linking, a grant finder tool, data behind a paper, open citation counts, a manuscript submission system and a number of APIs and bulk download options to access the content. It has helped further the Open Science agenda by integrating freely available research content, from full text open access research articles to preprints, books, and clinical guidelines. Finally, it provides a mechanism for funders to more readily meet their open access commitments. As such, it could play an important role in the context of Plan S. An additional key benefit of Europe PMC is its autonomy, providing a level of immunity from potential political and commercial priorities. This enables the platform to be genuinely independent and respond quickly to user needs.


Loss of Europe PMC is likely to disproportionately impact under-resourced users (e.g. researchers in developing countries and citizen scientists) who largely rely on open access to keep abreast of the latest research findings. The 29 international funders who use the platform as their open access repository would also incur additional costs in locating and using alternatives. In addition, many third-party developers who use the Europe PMC APIs would have to substantially rebuild their applications, which would require additional time and money.

Take home message

Overall, Europe PMC delivers excellent value – in both monetary and non-monetary terms – to funders as well as the global research community. Loss of such a platform would be a blow to the emerging Open Science agenda which is gaining significant momentum in recent years.

The full report can be accessed here. Please send questions related to the evaluation process to the Open Research team at Wellcome at openresearch@wellcome.ac.uk.

Tuesday, 11 June 2019

Submission system for text-mined annotations

From algorithms to the bench
Text-mining holds the promise of helping researchers to overcome information overload. It is a familiar premise: an avalanche of scientific knowledge is being produced and shared. Teaching machines to “read” might soon be the only manageable way to digest large amounts of information into useful facts.

To extract bits of information, such as biological concepts or relations, from the text, various text-mining tools have been developed in the recent years. Even as the text-mining technology becomes more widespread, an average researcher will rarely be exposed to its benefits. Scholars get to publications via a few familiar routes, which may not utilise text-mining technologies. In addition, different text-mining platforms may focus on different topics or categories: one suited for molecular interactions, another for gene-disease associations.

To address these issues, Europe PMC has established a platform that consolidates text-mined annotations from different sources and makes them available to the wider research community. The annotated concepts and relations are displayed on article pages via SciLite tool, and can be retrieved using RESTful API.


To simplify the process of sharing text-mining results Europe PMC has developed a dedicated Annotation Submission System. It allows expert text-mining providers to publish their annotations in Europe PMC. The system can also accept relevant statements manually curated by dedicated biocuration groups.

The submission process is straightforward and does not require strong technical skills. It is possible to submit an annotation file either using the web browser or programmatically. Note that for programmatic upload Cloud Storage System drivers are available in different languages.



Here is how annotation data is represented in the system:


Europe PMC Annotation Submission System accepts sentence-based annotations and named entity annotations. An example of a sentence-based annotation is a protein-protein interaction, while a chemical name can be represented as a named entity. All annotated concepts must be linked to ontologies and data resources. For instance gene/protein annotations link to a corresponding UniProt record. Submitters are also asked to specify the precise location of the annotation in the text, using prefix and postfix tags.
Such location information can be used to reciprocally link from the annotation to the relevant sentence of the article in Europe PMC via a link-back mechanism. neXtprot, an online knowledge platform on human proteins, participates in such link exchange. neXtprot entries often have associated functional gene annotations, known as geneRIFs. neXtprot users can now navigate from a neXtprot record directly to the relevant gene function statement found in the literature. Such location information can be used to reciprocally link from the annotation to the relevant sentence of the article in Europe PMC via a link-back mechanism. neXtprot, an online knowledge platform on human proteins, participates in such link exchange. neXtprot entries often have associated functional gene annotations, known as geneRIFs. neXtprot users can now navigate from a neXtprot record directly to the relevant gene function statement found in the literature.


Our aim is to make text-mining advances widely available for the benefit of the research community and we would not be able to do it without the support of our collaborators and annotation providers. Several text-mining groups have already made their annotations public in Europe PMC using the new submission system. Interested parties can share their results via Europe PMC platform, given that they adhere to the ground rules. If you would like to submit annotations please get in touch via annotations@europepmc.org. For more details join our free Annotation Submission Webinar on July 9th, 2PM GMT.

Wednesday, 1 May 2019

Making science open with the new Europe PMC plus

We are delighted to announce the launch of the new Europe PMC Plus - the manuscript submission system for authors supported by Europe PMC funders.
Europe PMC is the repository of choice for 29 international life sciences funders, who recommend or require that publications arising from the research they fund, are made openly available via Europe PMC. Grantees of Europe PMC funders can deposit their manuscripts for inclusion in Europe PMC and PMC USA using the Europe PMC plus submission system.
The new version of Europe PMC plus released on 1st May 2019 has been designed to simplify and streamline the submission process. All that’s needed to complete a submission to Europe PMC plus are citation details, submission files, and funding information. The funding is linked to the publication via an open GRIST database that contains public grant award data from all Europe PMC funders.
The submission process itself is very straightforward and is possible to complete within 10 minutes. In the best case scenario, if a manuscript is submitted correctly, and is reviewed promptly by the author, the manuscript can be made available in Europe PMC in as little as two weeks.
After announcing the start of this project just under a year ago, the new system has been developed as an open source application in partnership with the Collaborative Knowledge Foundation (Coko), eLife, Hindawi, and other community partners.
To learn how to use the new Europe PMC plus, and see what a submission process looks like, please visit our Youtube channel featuring short tutorial videos.

Monday, 25 March 2019

Prepare for the new Europe PMC plus

Europe PMC is getting ready to release an upgraded version of Europe PMC plus, a system for PIs supported by Europe PMC funders to submit accepted manuscripts for inclusion in Europe PMC and PMC. The new version of Europe PMC plus has an improved design, and new features for creating and reviewing manuscript submissions.

The new Europe PMC plus will be released on 1 May 2019. Just three things are needed to complete a submission to Europe PMC plus:

1
Citation details
Manuscripts submitted to Europe PMC plus must be accepted for publication by a peer reviewed journal. At least the article title and journal name are required.
2
Submission files
The manuscript and all related figures, tables, and supplementary materials should be uploaded to Europe PMC plus, and previewed using new tools.
3
Funding information
Researchers submitting manuscripts to Europe PMC plus must be funded by one of the Europe PMC funders. Grant information can be linked to manuscripts through a new simple search.

After a manuscript is submitted to Europe PMC plus, the submission goes through quality assurance before being processed into XML, HTML, and PDF versions for archiving, indexing, and display in Europe PMC and PMC. Before being made publicly available on Europe PMC, these web versions are made available to the reviewing researcher for a final review, to make any needed corrections.

The new submission system offers an updated design, clearer workflows, and new features, including an improved preview of submitted files, the ability to view submitted files and processed web versions side by side, and improved communication tools, which simplify creating and reviewing manuscript submissions.


My manuscripts list

Creating a submission

Checking submission input

Reviewing web versions

Since supporting open science is an important part of Europe PMC’s mission,  the new Europe PMC plus is a fully open source application. It is based on PubSweet, a free, open-source framework for building state-of-the-art publishing platforms, designed to be modular and flexible, so that individual components can be easily reused and adapted for various workflows.

The new Europe PMC plus has been developed in collaboration with the Collaborative Knowledge Foundation (Coko) and community partners including eLife and Hindawi. PubSweet community members subscribe to a common vision of creating open technologies that improve the speed of research.


To find out more about Europe PMC’s partnership with Coko to develop a web-based, open-source content and workflow management platform for manuscript ingest and processing, see http://blog.europepmc.org/2018/05/europe-pmc-and-coko-announce-partnership.html.

For more information about the other platforms that are built on top of PubSweet, visit https://coko.foundation/all-the-platforms.

For the latest updates on Europe PMC plus and other news from Europe PMC, follow us on Twitter at @EuropePMC_news.

Tuesday, 5 February 2019

Europe PMC’s response to the implementation guidance of Plan S

Plan S is an initiative for immediate and full open access to scholarly research publications put forward by cOAlition S, an international consortium of research funders.

In November 2018 specific implementation guidance on the Plan S principles has been released to the public with the aim of gathering feedback from various Plan S stakeholders, including researchers, publishers, funders, and other interested parties.

Europe PMC’s mission to support innovation based on open access content is well aligned with the fundamental principles of Plan S. Several Europe PMC funders have joined cOAlition S, and we will continue to support their open access policies in line with the Plan S initiative. Our response to Plan S guidance document is provided below.

1. Is there anything unclear or are there any issues that have not been addressed by the guidance document?


Europe PMC fully supports the mission of Plan S to drive universal open access for research articles. Many of the cOAlition S funders use Europe PMC as their repository of choice for publication outputs from life science funding programmes.

Europe PMC contains over 35 million abstracts and 5 million full text research publications, predominantly from the life sciences. The website is used by millions of people a month, and millions of megabytes of open data are downloaded programmatically via our APIs in the same time period.

Europe PMC meets all the requirements outlined in the implementation plans, including running a help desk and operating an XML (JATS) workflow. We strongly support this technical approach within the context of a large-scale, aggregated document collection such as Europe PMC. This approach provides the best opportunity for discovery, interoperability and reuse of the full text content of research articles, and therefore contribute effectively to open science.

It is not clear, however, how an XML workflow would map effectively to the institutional repository (IR) system as a whole, due to the redundancy across the highly distributed IR community. In a typical green OA workflow, each author of the same research paper would self-archive the paper in their own IR, typically as a Word or PDF document plus structured metadata. Multiple submissions of the same paper, in different locations with different metadata and full text document formats, already cause deduplication-type challenges when aggregating metadata. To generate multiple full text XML formats across different IRs would be a needless cost and would further exacerbate aggregation activities.

The generation of XML for a singular version of publication only needs to be done once, and from this, other formats can be generated (e.g. PDF, HTML etc). A shared service(s) that could deliver this core requirement, could also provide mechanisms for distributing the outputs widely, ensuring the IR community has maximal coverage and discovery of content for their institute or university.

2. Are there other mechanisms or requirements funders should consider to foster full and immediate Open Access of research outputs?


We would like to suggest that the Plan S funders consider an approach exemplified by Europe PMC (described in more detail below) to deliver on their repository requirements. While Europe PMC has a life science focus, the underlying infrastructure is generic, and multidisciplinary science means that the boundaries of what is deposited in Europe PMC are increasingly softened. Indeed, there are already full text articles in Europe PMC that may be considered primarily more from allied disciplines such chemistry, computer science, history of medicine, environmental science, health-related social science and so on. It is also conceivable that a small number of high-level disciplinary systems (e.g. physical sciences and SSH in addition to Europe PMC), could coordinate to deliver on the generic technical requirements, yet provide deep disciplinary expertise regarding the staffing of those services. Combining this kind of core technical capability with the networking capabilities of the IR community could be a very effective means to supporting the goals of Plan S.

Please find below more details on how Europe PMC addresses the Plan S repository implementation requirements.

Europe PMC overview

Europe PMC is an open repository of research publications, supporting the Open Access policies of 29 international funders of life sciences research, including several that have joined cOAlition S. The repository is built in collaboration with the PMC archive in the USA, and contains over 5 million full text articles and 35 million abstracts. Incoming full text articles are shared between the two sites daily. Europe PMC also recently started indexing preprint abstracts.

Europe PMC services

The content in Europe PMC is made available via the website (http://europepmc.org). In addition to providing access to the core publications, Europe PMC adds value in a number of ways. For example, Europe PMC is a major integrator of ORCIDs, data, open citations, text-mined concepts and data citations, and grant information. All these integrations are applied to incoming content automatically and all are available via the website and APIs (see below).

Through indexing rich metadata and full text XML, it is possible to search transparently for full text articles in Europe PMC with a CC-BY license, and filter these by funder, publication date, presence of a data availability statement, and so on. Remixing open data allows Europe PMC to, for example, show open access publishing records for ORCIDs (see http://europepmc.org/authors/0000-0002-3897-7955 for example).

A key part of Europe PMC’s mission is to encourage innovation based on open access content. We therefore provide programmatic access to abstracts and full text via APIs, including RESTful (JSON, XML and Dublin Core) and OAI. Bulk download by FTP is also provided (http://europepmc.org/developers). Sharing content as widely as possible is our top priority.

Europe PMC also runs a grants metadata database for the funders (http://europepmc.org/grantfinder), so that incoming publications can be matched to grants.

Europe PMC is part of the global life science data infrastructure. We collaborate and share content with the USA National Library of Medicine, which runs PubMed and PMC. Europe PMC is an ELIXIR Core Data Resource and provides integration with over 40 critical data resources, such as the Protein Data Bank, the European Nucleotide Archive, UniProt and OMIM.

Technical considerations

 

Europe PMC is built on the widely used data standard for research publications, JATS (https://jats.nlm.nih.gov/index.html). While Europe PMC has a life science focus, JATS is not discipline specific and therefore could be used for any research article.

Use of JATS (1) future-proofs the archive for long-term access (proven adherence to standards via validation against the data model and no reliance on proprietary formats) and (2) provides the ability to query across all content in a consistent manner. This is very important for deep queries, third party software development, and text and data mining. JATS, being an XML format, allows specific and important elements of an article (e.g. ORCIDs, licences or article sections such as Data Availability Statements) to be identified. This kind of accurate deep indexing would be impossible for articles archived in an unstructured mixture of formats (PDF, Word, HTML).

Uploading publications


Content is ingested into Europe PMC via two routes (1) journals that archive content in PMC (either in full or in part); and (2) via the Europe PMC or PMC manuscript submission systems. The manuscript submissions are overseen by Helpdesk staff who are trained in how to use the submission system and support authors in the submission process. Simply put, authors upload files (final accepted manuscripts, typically in Word or PDF), which, after various checks for integrity are converted to XML via a contract. The returned XML-formatted files are rendered to HTML for QA and sign-off by the contact author. It is possible to hold the article securely until such a time it can be made available, for example in the case that the article is submitted prior to an embargo date. The Helpdesk staff also handle incoming grant data from funders, in order to match incoming submissions with specific grants. These grant data are made public via the Grant Finder tool on the Europe PMC website, and via public APIs.

Europe PMC has recently been collaborating with the CoKo Foundation, Hindawi Publishing and eLife to develop an open source submission system for manuscripts. This will be released as a beta version very soon, and in full production shortly thereafter.

Tuesday, 4 December 2018

Bringing PubMed Commons to Europe PMC

Scientific communication does not stop at the moment of publication. Scientific discussions in the form of post-publication peer review can provide valuable insights for published articles, bring up an alternative research perspective, or even present updates for published research. At Europe PMC it’s part of our mission to support innovation in scientific publishing, and we believe that community feedback constitutes an important part of scholarly communication process. That’s why we are proud to partner with Hypothes.is, a non-profit organisation developing open annotation tools, to display post-publication comments from PubMed Commons on articles in Europe PMC.

What is PubMed Commons?

PubMed Commons was a post-publication peer review forum for authors run by the NLM (National Library of Medicine). It has been an important venue for scientific discourse of published research literature. With PubMed Commons’ departure in February 2018, comments for over 6000 PubMed-indexed articles have been carefully preserved by Hypothes.is. Every PubMed Commons comment along with replies has been transformed into a Hypothes.is annotation, available for access and reuse.

To ensure that PubMed Commons comments can be publicly accessed and explored in appropriate context, we have exposed available comments on publication pages using an open API provided by Hypothes.is.

How can I view publication comments?

You can see the comments appear on article abstract and full text pages. The link to the comments labeled as “PubMed Commons Archive” is displayed in the right-hand menu, just below the annotations tool.

Example: link to PubMed Commons comments appears on one of the articles in Europe PMC (PMID:28322189)

All comments appear in a sidebar menu in chronological order. They are presented as “page notes” - comments on a document as a whole, in contrast to annotations that are linked to a specific word or sentence in the article.


Comments are visible publicly without an account. They are displayed in view-only mode, and there is currently no option to add new comments in Europe PMC. Only comments from PubMed Commons archive are displayed on Europe PMC website, but you can explore and add other public annotations on Europe PMC articles with open tools from Hypothes.is.

We hope that this effort will provide better visibility for scholarly commentaries as an important part of the scholarly record.