Plan S is an initiative for immediate and full open access to scholarly research publications put forward by cOAlition S, an international consortium of research funders.
In November 2018 specific implementation guidance on the Plan S principles has been released to the public with the aim of gathering feedback from various Plan S stakeholders, including researchers, publishers, funders, and other interested parties.
Europe PMC’s mission to support innovation based on open access content is well aligned with the fundamental principles of Plan S. Several Europe PMC funders have joined cOAlition S, and we will continue to support their open access policies in line with the Plan S initiative. Our response to Plan S guidance document is provided below.
1. Is there anything unclear or are there any issues that have not been addressed by the guidance document?
Europe PMC fully supports the mission of Plan S to drive universal open access for research articles. Many of the cOAlition S funders use Europe PMC as their repository of choice for publication outputs from life science funding programmes.
Europe PMC contains over 35 million abstracts and 5 million full text research publications, predominantly from the life sciences. The website is used by millions of people a month, and millions of megabytes of open data are downloaded programmatically via our APIs in the same time period.
Europe PMC meets all the requirements outlined in the implementation plans, including running a help desk and operating an XML (JATS) workflow. We strongly support this technical approach within the context of a large-scale, aggregated document collection such as Europe PMC. This approach provides the best opportunity for discovery, interoperability and reuse of the full text content of research articles, and therefore contribute effectively to open science.
It is not clear, however, how an XML workflow would map effectively to the institutional repository (IR) system as a whole, due to the redundancy across the highly distributed IR community. In a typical green OA workflow, each author of the same research paper would self-archive the paper in their own IR, typically as a Word or PDF document plus structured metadata. Multiple submissions of the same paper, in different locations with different metadata and full text document formats, already cause deduplication-type challenges when aggregating metadata. To generate multiple full text XML formats across different IRs would be a needless cost and would further exacerbate aggregation activities.
The generation of XML for a singular version of publication only needs to be done once, and from this, other formats can be generated (e.g. PDF, HTML etc). A shared service(s) that could deliver this core requirement, could also provide mechanisms for distributing the outputs widely, ensuring the IR community has maximal coverage and discovery of content for their institute or university.
2. Are there other mechanisms or requirements funders should consider to foster full and immediate Open Access of research outputs?
We would like to suggest that the Plan S funders consider an approach exemplified by Europe PMC (described in more detail below) to deliver on their repository requirements. While Europe PMC has a life science focus, the underlying infrastructure is generic, and multidisciplinary science means that the boundaries of what is deposited in Europe PMC are increasingly softened. Indeed, there are already full text articles in Europe PMC that may be considered primarily more from allied disciplines such chemistry, computer science, history of medicine, environmental science, health-related social science and so on. It is also conceivable that a small number of high-level disciplinary systems (e.g. physical sciences and SSH in addition to Europe PMC), could coordinate to deliver on the generic technical requirements, yet provide deep disciplinary expertise regarding the staffing of those services. Combining this kind of core technical capability with the networking capabilities of the IR community could be a very effective means to supporting the goals of Plan S.
Please find below more details on how Europe PMC addresses the Plan S repository implementation requirements.
Europe PMC overview
Europe PMC services
A key part of Europe PMC’s mission is to encourage innovation based on open access content. We therefore provide programmatic access to abstracts and full text via APIs, including RESTful (JSON, XML and Dublin Core) and OAI. Bulk download by FTP is also provided (http://europepmc.org/developers). Sharing content as widely as possible is our top priority.
Europe PMC also runs a grants metadata database for the funders (http://europepmc.org/grantfinder), so that incoming publications can be matched to grants.
Europe PMC is part of the global life science data infrastructure. We collaborate and share content with the USA National Library of Medicine, which runs PubMed and PMC. Europe PMC is an ELIXIR Core Data Resource and provides integration with over 40 critical data resources, such as the Protein Data Bank, the European Nucleotide Archive, UniProt and OMIM.
Technical considerations
Europe PMC is built on the widely used data standard for research publications, JATS (https://jats.nlm.nih.gov/index.html). While Europe PMC has a life science focus, JATS is not discipline specific and therefore could be used for any research article.
Use of JATS (1) future-proofs the archive for long-term access (proven adherence to standards via validation against the data model and no reliance on proprietary formats) and (2) provides the ability to query across all content in a consistent manner. This is very important for deep queries, third party software development, and text and data mining. JATS, being an XML format, allows specific and important elements of an article (e.g. ORCIDs, licences or article sections such as Data Availability Statements) to be identified. This kind of accurate deep indexing would be impossible for articles archived in an unstructured mixture of formats (PDF, Word, HTML).
Uploading publications
Content is ingested into Europe PMC via two routes (1) journals that archive content in PMC (either in full or in part); and (2) via the Europe PMC or PMC manuscript submission systems. The manuscript submissions are overseen by Helpdesk staff who are trained in how to use the submission system and support authors in the submission process. Simply put, authors upload files (final accepted manuscripts, typically in Word or PDF), which, after various checks for integrity are converted to XML via a contract. The returned XML-formatted files are rendered to HTML for QA and sign-off by the contact author. It is possible to hold the article securely until such a time it can be made available, for example in the case that the article is submitted prior to an embargo date. The Helpdesk staff also handle incoming grant data from funders, in order to match incoming submissions with specific grants. These grant data are made public via the Grant Finder tool on the Europe PMC website, and via public APIs.
Europe PMC has recently been collaborating with the CoKo Foundation, Hindawi Publishing and eLife to develop an open source submission system for manuscripts. This will be released as a beta version very soon, and in full production shortly thereafter.