Monday, 25 March 2019

Prepare for the new Europe PMC plus

Europe PMC is getting ready to release an upgraded version of Europe PMC plus, a system for PIs supported by Europe PMC funders to submit accepted manuscripts for inclusion in Europe PMC and PMC. The new version of Europe PMC plus has an improved design, and new features for creating and reviewing manuscript submissions.

The new Europe PMC plus will be released on 1 May 2019. Just three things are needed to complete a submission to Europe PMC plus:

1
Citation details
Manuscripts submitted to Europe PMC plus must be accepted for publication by a peer reviewed journal. At least the article title and journal name are required.
2
Submission files
The manuscript and all related figures, tables, and supplementary materials should be uploaded to Europe PMC plus, and previewed using new tools.
3
Funding information
Researchers submitting manuscripts to Europe PMC plus must be funded by one of the Europe PMC funders. Grant information can be linked to manuscripts through a new simple search.

After a manuscript is submitted to Europe PMC plus, the submission goes through quality assurance before being processed into XML, HTML, and PDF versions for archiving, indexing, and display in Europe PMC and PMC. Before being made publicly available on Europe PMC, these web versions are made available to the reviewing researcher for a final review, to make any needed corrections.

The new submission system offers an updated design, clearer workflows, and new features, including an improved preview of submitted files, the ability to view submitted files and processed web versions side by side, and improved communication tools, which simplify creating and reviewing manuscript submissions.


My manuscripts list

Creating a submission

Checking submission input

Reviewing web versions

Since supporting open science is an important part of Europe PMC’s mission,  the new Europe PMC plus is a fully open source application. It is based on PubSweet, a free, open-source framework for building state-of-the-art publishing platforms, designed to be modular and flexible, so that individual components can be easily reused and adapted for various workflows.

The new Europe PMC plus has been developed in collaboration with the Collaborative Knowledge Foundation (Coko) and community partners including eLife and Hindawi. PubSweet community members subscribe to a common vision of creating open technologies that improve the speed of research.


To find out more about Europe PMC’s partnership with Coko to develop a web-based, open-source content and workflow management platform for manuscript ingest and processing, see http://blog.europepmc.org/2018/05/europe-pmc-and-coko-announce-partnership.html.

For more information about the other platforms that are built on top of PubSweet, visit https://coko.foundation/all-the-platforms.

For the latest updates on Europe PMC plus and other news from Europe PMC, follow us on Twitter at @EuropePMC_news.

Tuesday, 5 February 2019

Europe PMC’s response to the implementation guidance of Plan S

Plan S is an initiative for immediate and full open access to scholarly research publications put forward by cOAlition S, an international consortium of research funders.

In November 2018 specific implementation guidance on the Plan S principles has been released to the public with the aim of gathering feedback from various Plan S stakeholders, including researchers, publishers, funders, and other interested parties.

Europe PMC’s mission to support innovation based on open access content is well aligned with the fundamental principles of Plan S. Several Europe PMC funders have joined cOAlition S, and we will continue to support their open access policies in line with the Plan S initiative. Our response to Plan S guidance document is provided below.

1. Is there anything unclear or are there any issues that have not been addressed by the guidance document?


Europe PMC fully supports the mission of Plan S to drive universal open access for research articles. Many of the cOAlition S funders use Europe PMC as their repository of choice for publication outputs from life science funding programmes.

Europe PMC contains over 35 million abstracts and 5 million full text research publications, predominantly from the life sciences. The website is used by millions of people a month, and millions of megabytes of open data are downloaded programmatically via our APIs in the same time period.

Europe PMC meets all the requirements outlined in the implementation plans, including running a help desk and operating an XML (JATS) workflow. We strongly support this technical approach within the context of a large-scale, aggregated document collection such as Europe PMC. This approach provides the best opportunity for discovery, interoperability and reuse of the full text content of research articles, and therefore contribute effectively to open science.

It is not clear, however, how an XML workflow would map effectively to the institutional repository (IR) system as a whole, due to the redundancy across the highly distributed IR community. In a typical green OA workflow, each author of the same research paper would self-archive the paper in their own IR, typically as a Word or PDF document plus structured metadata. Multiple submissions of the same paper, in different locations with different metadata and full text document formats, already cause deduplication-type challenges when aggregating metadata. To generate multiple full text XML formats across different IRs would be a needless cost and would further exacerbate aggregation activities.

The generation of XML for a singular version of publication only needs to be done once, and from this, other formats can be generated (e.g. PDF, HTML etc). A shared service(s) that could deliver this core requirement, could also provide mechanisms for distributing the outputs widely, ensuring the IR community has maximal coverage and discovery of content for their institute or university.

2. Are there other mechanisms or requirements funders should consider to foster full and immediate Open Access of research outputs?


We would like to suggest that the Plan S funders consider an approach exemplified by Europe PMC (described in more detail below) to deliver on their repository requirements. While Europe PMC has a life science focus, the underlying infrastructure is generic, and multidisciplinary science means that the boundaries of what is deposited in Europe PMC are increasingly softened. Indeed, there are already full text articles in Europe PMC that may be considered primarily more from allied disciplines such chemistry, computer science, history of medicine, environmental science, health-related social science and so on. It is also conceivable that a small number of high-level disciplinary systems (e.g. physical sciences and SSH in addition to Europe PMC), could coordinate to deliver on the generic technical requirements, yet provide deep disciplinary expertise regarding the staffing of those services. Combining this kind of core technical capability with the networking capabilities of the IR community could be a very effective means to supporting the goals of Plan S.

Please find below more details on how Europe PMC addresses the Plan S repository implementation requirements.

Europe PMC overview

Europe PMC is an open repository of research publications, supporting the Open Access policies of 29 international funders of life sciences research, including several that have joined cOAlition S. The repository is built in collaboration with the PMC archive in the USA, and contains over 5 million full text articles and 35 million abstracts. Incoming full text articles are shared between the two sites daily. Europe PMC also recently started indexing preprint abstracts.

Europe PMC services

The content in Europe PMC is made available via the website (http://europepmc.org). In addition to providing access to the core publications, Europe PMC adds value in a number of ways. For example, Europe PMC is a major integrator of ORCIDs, data, open citations, text-mined concepts and data citations, and grant information. All these integrations are applied to incoming content automatically and all are available via the website and APIs (see below).

Through indexing rich metadata and full text XML, it is possible to search transparently for full text articles in Europe PMC with a CC-BY license, and filter these by funder, publication date, presence of a data availability statement, and so on. Remixing open data allows Europe PMC to, for example, show open access publishing records for ORCIDs (see http://europepmc.org/authors/0000-0002-3897-7955 for example).

A key part of Europe PMC’s mission is to encourage innovation based on open access content. We therefore provide programmatic access to abstracts and full text via APIs, including RESTful (JSON, XML and Dublin Core) and OAI. Bulk download by FTP is also provided (http://europepmc.org/developers). Sharing content as widely as possible is our top priority.

Europe PMC also runs a grants metadata database for the funders (http://europepmc.org/grantfinder), so that incoming publications can be matched to grants.

Europe PMC is part of the global life science data infrastructure. We collaborate and share content with the USA National Library of Medicine, which runs PubMed and PMC. Europe PMC is an ELIXIR Core Data Resource and provides integration with over 40 critical data resources, such as the Protein Data Bank, the European Nucleotide Archive, UniProt and OMIM.

Technical considerations

 

Europe PMC is built on the widely used data standard for research publications, JATS (https://jats.nlm.nih.gov/index.html). While Europe PMC has a life science focus, JATS is not discipline specific and therefore could be used for any research article.

Use of JATS (1) future-proofs the archive for long-term access (proven adherence to standards via validation against the data model and no reliance on proprietary formats) and (2) provides the ability to query across all content in a consistent manner. This is very important for deep queries, third party software development, and text and data mining. JATS, being an XML format, allows specific and important elements of an article (e.g. ORCIDs, licences or article sections such as Data Availability Statements) to be identified. This kind of accurate deep indexing would be impossible for articles archived in an unstructured mixture of formats (PDF, Word, HTML).

Uploading publications


Content is ingested into Europe PMC via two routes (1) journals that archive content in PMC (either in full or in part); and (2) via the Europe PMC or PMC manuscript submission systems. The manuscript submissions are overseen by Helpdesk staff who are trained in how to use the submission system and support authors in the submission process. Simply put, authors upload files (final accepted manuscripts, typically in Word or PDF), which, after various checks for integrity are converted to XML via a contract. The returned XML-formatted files are rendered to HTML for QA and sign-off by the contact author. It is possible to hold the article securely until such a time it can be made available, for example in the case that the article is submitted prior to an embargo date. The Helpdesk staff also handle incoming grant data from funders, in order to match incoming submissions with specific grants. These grant data are made public via the Grant Finder tool on the Europe PMC website, and via public APIs.

Europe PMC has recently been collaborating with the CoKo Foundation, Hindawi Publishing and eLife to develop an open source submission system for manuscripts. This will be released as a beta version very soon, and in full production shortly thereafter.

Tuesday, 4 December 2018

Bringing PubMed Commons to Europe PMC

Scientific communication does not stop at the moment of publication. Scientific discussions in the form of post-publication peer review can provide valuable insights for published articles, bring up an alternative research perspective, or even present updates for published research. At Europe PMC it’s part of our mission to support innovation in scientific publishing, and we believe that community feedback constitutes an important part of scholarly communication process. That’s why we are proud to partner with Hypothes.is, a non-profit organisation developing open annotation tools, to display post-publication comments from PubMed Commons on articles in Europe PMC.

What is PubMed Commons?

PubMed Commons was a post-publication peer review forum for authors run by the NLM (National Library of Medicine). It has been an important venue for scientific discourse of published research literature. With PubMed Commons’ departure in February 2018, comments for over 6000 PubMed-indexed articles have been carefully preserved by Hypothes.is. Every PubMed Commons comment along with replies has been transformed into a Hypothes.is annotation, available for access and reuse.

To ensure that PubMed Commons comments can be publicly accessed and explored in appropriate context, we have exposed available comments on publication pages using an open API provided by Hypothes.is.

How can I view publication comments?

You can see the comments appear on article abstract and full text pages. The link to the comments labeled as “PubMed Commons Archive” is displayed in the right-hand menu, just below the annotations tool.

Example: link to PubMed Commons comments appears on one of the articles in Europe PMC (PMID:28322189)

All comments appear in a sidebar menu in chronological order. They are presented as “page notes” - comments on a document as a whole, in contrast to annotations that are linked to a specific word or sentence in the article.


Comments are visible publicly without an account. They are displayed in view-only mode, and there is currently no option to add new comments in Europe PMC. Only comments from PubMed Commons archive are displayed on Europe PMC website, but you can explore and add other public annotations on Europe PMC articles with open tools from Hypothes.is.

We hope that this effort will provide better visibility for scholarly commentaries as an important part of the scholarly record.

Thursday, 8 November 2018

Mapping out the path to data

Data availability statements in biomedical literature
Every research paper is a story about data. Over 2.5 million articles in Europe PMC contain data of all sorts, from microscopy images to bird song recordings. While in the past, a research paper might have deciphered a single gene sequence, modern experiments often produce gigabytes of information at once. This means that data described in a paper might be spread across several databases, creating a challenge for researchers who want to access and reuse it.

To boost reproducibility and reuse of research datasets many scientific journals have introduced a data availability statement - a distinct article section that contains guidelines on data access. The data availability statement, as the name suggests, states whether the data is available, underlines conditions for access, and includes hyperlinks to publicly archived datasets analysed or generated during the study.

In fact, to date, over 230,000 full-text publications in Europe PMC contain a dedicated data availability section, with the oldest record dating to 1980. However, it’s only in 2014 that we see a sharp increase in a number of articles that include a data statement, reaching just over a quarter of all full-text articles published in 2018 so far.


data_availability

*data analysis by Michael Parkin

This increase is a very positive trend in support of research reproducibility; however, there is significant variation in how Data Availability sections are included in publications across different journals. For example, they appear in a number of different places within an article, sometimes as a stand-alone section, and sometimes as part of the Methods, Results or Discussion. Many variations in the title of the Data Availability section also hamper discoverability across multiple journals.



To improve access to scientific data reported in research papers and enable analysis of data sharing practices, Europe PMC has built search filters that enable searching for data availability statements specifically. This should enable trends analysis and research into data sharing practices, potentially providing insight into how data is shared and the downstream impact of data sharing.

The right kind of search

Granular search within article sections has been available in Europe PMC for a while. It allows you to restrict your searches to figure legends or materials and methods section to get more relevant results. You can access this feature in the Europe PMC Advanced Search by selecting the section of interest from a drop-down menu.


We have recently extended the list of full text article sections available for deep searching by adding a “Data Availability” category. It unifies all different name variants mentioned above, and can be searched using (DATA_AVAILABILITY:*) syntax. For example, if you would like to retrieve papers that have data deposited in Figshare, you could search for (DATA_AVAILABILITY:Figshare).

While the search tool developed by Europe PMC can help surface the data reported in scientific literature, it relies on the publication authors to provide sufficient information on how the data can be accessed. There is great variety in data sharing practices, from data being deposited in public community database such as the European Nucleotide Archive (ENA) or the Protein Data Bank (PDB), to data included in supplemental files, in Institutional Repositories, on websites, or being available upon request from the authors.

As an example, the share of data availability statements including the word “request” has risen dramatically in the last three years, and has reached one third of all data availability statements in 2018.


data_availability_request

*data analysis by Michael Parkin

Getting to the data

Integrating research data and literature is an important part of the Europe PMC mission to support data discovery and reuse. We identify data DOIs and accession numbers for over 40 life science resources in abstracts and full text articles using a text-mining approach. Over 450,000 publications in Europe PMC cite 1,000,000 unique datasets.


By using the advanced search tools you can identify papers that have generated protein structures, or find articles that cite proteomics datasets. The data availability section search enables you to go a step further and map out the way to supporting data. The new data search filter is one of the latest additions to the Europe PMC tools suite for locating biological data cited in the literature, including the SciLite application powered by the Annotations API, and the Data tab powered by the Data module of the Europe PMC API.

Ready for a test drive?

We hope that this new search feature for data availability can help improve reproducibility of research results, by making it easier for scientists and data enthusiasts to track underlying data from thousands of papers, wherever it may be hosted.

It can also enable detailed analyses of research data sharing practices. We can get deeper insights into the effects of publishers’ and funders’ policies on data sharing, researchers’ preferences for discipline-specific vs generic data repositories, or variations in data citations.

Whatever the use case, give the new tool a try, play with the data, and share your thoughts. We are always keen to hear your ideas and suggestions.

Thursday, 11 October 2018

Announcing new Europe PMC Beta!

We are always listening to what our users have to say. And this week we are happy to present you a Beta version of Europe PMC with lots of improvements based on extensive user feedback.

Europe PMC Beta will be available alongside the current Europe PMC website throughout October and November. You can access it at any time using the Beta link in the header of the current site.



Over the last year we’ve been making several design changes to make Europe PMC easier to use, and we’re excited to be sharing them with you! Here is a quick overview of the major updates, which mainly apply to the two key pages: search results and the article page.

Search results page

Search results have a new layout and design. You’ll notice there are labels to identify articles which are reviews or preprints, and those that have free full text or are open access and free to reuse (as shown below). You can now apply more than one search filter at a time (the current site only allows one filter to be selected). And if you want to save the citations of several articles you can add them to your export list, then review and edit the list before exporting the citations.

Article page

Most importantly, the abstract and the full text of each article have been combined into a single page. It’s great news for readers, as you no longer need to switch between different records, and can view all relevant information on the same page.


As you scroll down the article page you can discover the full text, if available, as well as useful content. We have introduced a left-hand navigation on the article page to help you find your way around the publication and related data. The navigation links to sections on the page including Figures, Full text, Data, Impact, References, Related content, and Funding. Each section is packed with handy resources, for example the Impact section will contain not only traditional article citations and alternative metrics, but also show you if the article has been cited in Wikipedia, or curated into a life science database. The Data section will be your single go-to place for all research data associated with a study, including links to the supplemental files, as well as research data cited as accession numbers or DOIs.
Figures are perhaps one of the most essential parts of a research publication - they present scientific data and summarise major results in a visual way. We have put special emphasis on figures by displaying a figure preview directly under the abstract. It allows you to quickly scan the primary data and get an overview of the key findings. The figure preview feature is only available if the article has an open access license.


We have also added a tools menu on the right for some frequently used functions. It enables you to:

- Highlight biological terms, such as organisms or gene/protein names in the article, to skim read the text
- Get the citation, if you want to cite the article
- Open the PDF

We’ve also significantly improved the Highlight Terms (also known as the SciLite annotations) feature. When you first open the side panel using the ‘Highlight terms’ link, all the unique biological terms that have been found in the abstract or full text can be seen at a glance. It’s easy to highlight the terms individually, or alternatively to highlight all terms in a category (such as diseases), and see where they appear in the article.


All design changes on Europe PMC Beta are based on user research findings and feedback. In 2017 we asked 12 researchers to complete a diary documenting their literature searches over one week, to better understand how and why they carried out literature searches in the context of their day-to-day work. We also carried out several rounds of usability testing, to evaluate how well Europe PMC works for the people who use it. If you are curious to see it, the results of our user research are published on Figshare.
An experience map illustrating how life scientists search for research literature based on user research by the Europe PMC
UX team. https://doi.org/10.6084/m9.figshare.4789744.v1

We would love to hear your thoughts on Europe PMC Beta! Please use the Feedback button to leave your feedback, or email helpdesk@europepmc.org. If you would like to take part in future Europe PMC usability testing, to help us continue to improve the website, please contact helpdesk@europepmc.org.


We will make further improvements before any of the new designs and features are available on the current Europe PMC website, and we hope that with your help we will continue creating the best tools for literature research.

Wednesday, 11 July 2018

Preprints in Europe PMC: reducing friction for discoverability

From July 2018, the Europe PMC repository will start indexing preprints. Making preprints discoverable through Europe PMC will make the science reported in preprints more widely discoverable and support their inclusion into workflows such as grant reporting, article citing and credit and attribution. This blog post explains why we have done this, and discusses some of the opportunities and challenges that arise from this decision.

In the life sciences, peer reviewed journal articles are the global currency by which we share research results, used also, in part, in funding and career decisions. But in the past few years the use of preprints - non-peer reviewed articles, posted to preprint servers - has become increasingly popular. According to Jordan Anaya of PrePubMed, over 2300 new life sciences preprints were published in June 2018 (see Fig.1), in large part driven by the use of bioRxiv.


Figure 1. Growth in preprint uptake in the life sciences (Source: http://www.prepubmed.org).

The very widespread use of preprints for biologists has a long way to go - the rate of growth is impressive, but still, as measured against the monthly ingest of peer reviewed papers by PubMed, it represents only 2-3 % of that volume. While the use of preprints has been particularly popular in computational biology and genomics, questions remain for many researchers on the long-term benefits of preprints. In recognition both of the trending popularity and the need to better understand the effect of preprints on the publishing ecosystem, Europe PMC will now be including preprint records.  

Europe PMC and preprints

From July 2018, Europe PMC will include abstracts for preprints that have a DOI and can be retrieved via Crossref metadata services. This means that about 37,000 preprints will be immediately discoverable in Europe PMC; a figure which we anticipate will grow by around 2000 preprints every month.

By restricting Crossref searches by DOI prefix, the initial preprint servers to be indexed include: bioRxiv, PeerJ Preprints, ChemRxiv, F1000Res, and the Open Research platforms powered by F1000: Gates Open Res, Wellcome Open Res, HRB Open Res, AAS Open Res, and MNI Open Res. We are using this filtering approach in order to include preprints that have screening protocols in place, and to ensure we do not inadvertently include blog posts or other types of non-peer reviewed content. You can expect to find preprints in Europe PMC within 24 hours of being sent to Crossref; links to the full text on the preprint server via the DOI are included.

To distinguish preprints from peer reviewed articles in Europe PMC, each preprint is given a PPR ID, and is clearly labelled as a preprint, both on the abstract view and the search results (see Fig. 2). When preprints have subsequently been published as peer-reviewed articles and indexed in Europe PMC1 they are crosslinked to each other. About 14,000 preprints have so far been published in peer reviewed journals via these cross-link detection methods. All preprint content will also available in Europe PMC APIs as well as on the website.

Preprints can be claimed to ORCID iDs (indeed, in the pilot set of ~120 preprints, 77 have already been linked to ORCID iDs ) and are also included in Europe PMC routine text mining processes that identify genes/proteins, organisms, diseases, and data citations, among other key biological entities.

Making preprints discoverable in Europe PMC will make the science reported in preprints more widely discoverable and support their inclusion into workflows such as grant reporting, citing and credit and attribution.



Figure 2. Search results and abstract views in Europe PMC differentiate preprint from peer-reviewed articles.


1We use Crossref's "is-preprint-of" field as well as a basic citation metadata check to find matching peer-reviewed papers and preprints

Europe PMC as a platform for innovation

At Europe PMC, we support the use of preprints as a means to communicate research results rapidly. But we recognise that there are open questions regarding the effects of the widespread use of preprints in particular for (non-expert) readers and informations seekers, and the effects of preprints on the current life sciences publishing ecosystem. We therefore see the additional benefit of hosting preprints from several sources in Europe PMC as providing an open platform to address some of these questions. Putting preprints in the context of peer-reviewed content will help support the analysis of, for example, the impact (positive or negative) of preprints on scholarly communications overall. Do they add to churn or alleviate churn? When is a preprint cited preferentially to a peer reviewed paper? How should versions be managed across different platforms? For example, using the smart filter to retrieve all preprints that have a linked peer-reviewed paper via the Europe PMC API, and then comparing the publication dates, it is possible to reveal the median time between preprint posting and publication (Fig. 3) at around 4-5 months.


Figure 3. Time between posting a preprint and publication of a peer-reviewed article. Data taken from bioRxiv and PeerJ Preprints, using metadata supplied to Crossref. Note that a small number of preprints appear to have been posted after a peer-reviewed paper has been published, illustrating that reusing data in different contexts can provide opportunities for improved quality control. Data analysis by Michael Parkin.

The future

Aggregating a relatively small number of preprint abstracts from several different sources has already exposed some challenges in preprint management: there is variability in the scope of metadata supplied and in the handling of versions by different preprint servers. Harmonizing on these issues will require work and collaboration; we would welcome comments on these matters.

Including full text in addition to abstracts will open further questions regarding technical robustness and publishing ecosystem dynamics. With full text, we can answer deeper and perhaps more critical questions such as the impact of peer review on the evolution of a paper, and how robustly the scientific conclusions are supported re: availability of supporting data. With full text, the community can experiment on a more granular level with peer review, accreditation/badging, data integration and so on. To explore these issues fully, preprints made available in a structured format (XML) with machine-readable (and open) licenses will greatly facilitate responsible sharing, analysis and reuse of content by different communities.

Friday, 18 May 2018

Europe PMC and Coko Announce Partnership

The Collaborative Knowledge Foundation (CoKo) and Europe PMC are excited to announce a new partnership to develop web-based, open source content and workflow management components that will enable ingest and processing of manuscripts. The system will be built on Coko’s PubSweet technology framework and will contribute to the vision of creating modern, digital-first technologies that improve the speed of research.

Europe PMC will be joining a community of publishers interested in collaborating to provide the research and scientific community better tools and platforms for the publication and broad dissemination of research. Hindawi, eLife, and UC Press are also partners in this endeavor.

Michele Smith, Product Manager for Europe PMC, says: “We're delighted to team up with CoKo and partners to build an open source deposition system for accepted manuscripts - a central part of our support for the Europe PMC funders and their open access policies. Collaborating to build this open source technology is not only a sustainable approach, but also very much aligned with our mission to make Europe PMC a platform for innovation”.


Coko builds and supports community-driven open source technology and “born open” best practices for software development. The community consists of organizations adopting or using Coko technologies, partnering open source technology builders, industry organizations, standards bodies and affiliates. While open source software is freely available to all, successful open source projects are ones with wide adoption and active community support.


Strong partnerships and open source solutions can transform research communication. We’re eager to get started. As with all our efforts, we will post updates here and throughout our channels.