News blog

Updates from Europe PMC, a global database of life sciences literature

Europe PMC team

 | 21 February 2022

 | 11 MINS READ

Europe PMC adopts the Principles of Open Scholarly Infrastructure


As a long-standing service and infrastructure provider in the open science ecosystem, Europe PMC supports the Principles of Open Scholarly Infrastructure (POSI). We welcome the momentum gathering behind this initiative to promote the need to support and sustain the open infrastructure.

Europe PMC has been a part of the public and open infrastructure for over 15 years and is run and managed by EMBL-EBI (which is part of the pan-European organisation of EMBL). It is funded by 34 international funders and is community-driven, open infrastructure, set in the context of key global open data resources such as the European Nucleotide Archive (INSDC), the wwPDB and the European Genome-Phenome Archive. All of these resources exist for the public good, led by scientific need and international collaborations, and have open governance structures and a commitment to long-term sustainability. Together with PMC USA, Europe PMC is a part of the PubMed Central International archive network, which plays an integral part in fulfilling shared goals to enable international open science. Europe PMC has been selected as an ELIXIR Core Data Resource, which means that it is of fundamental importance to the wider life-science community and the long-term preservation of biological data.

Since the original POSI blog post by Bilder, Lin and Neylon (2015), a number of organisations that provide open infrastructure have adopted the principles, for example Crossref, ROR, OurResearch and DataCite, many of which Europe PMC collaborates with.

We decided to test the Europe PMC open infrastructure against POSI. By completing the review we were able to identify areas of the principles that align, but also others that do not map well to publicly funded services and organisations. We hope that the details below will provide another use case to extend POSI to include a greater diversity of open infrastructures. 

We also thank Geoffrey Bilder (one of the authors of POSI) and Ed Pentz from Crossref for a constructive discussion about POSI and their perception of the principles and how Europe PMC aligns with them. 

How Europe PMC meets the Principles of Open Scholarly Infrastructure (POSI)

Governance

🟢  Coverage across the research enterprise

🟢  Stakeholder governed

🟢  Non-discriminatory membership

🟢  Transparent operations

🟢  Cannot lobby

🟡  Living will

🟢  Formal incentives to fulfil mission & wind-down

Sustainability

🟡  Time-limited funds are used only for time-limited activities

🔴  Goal to generate surplus

🔴  Goal to create contingency fund to support operations for 12 months

🟢  Mission-consistent revenue generation

🟢  Revenue based on services, not data

Insurance

🟡  Open source

🟢  Open data (within constraints of privacy laws)

🟢  Available data (within constraints of privacy laws)

🟡  Patent non-assertion

Governance

🟢 Coverage across the research enterprise

It is increasingly clear that research transcends disciplines, geography, institutions and stakeholders. The infrastructure that supports it needs to do the same.

Europe PMC focuses on content and data from the life sciences, however Europe PMC contains content that is cross-disciplinary, for example from social sciences, chemistry and physics. The infrastructure and content ingest process in both cases is generic and utilises cross-discipline community standards (such as the JATS XML data model used by the publishing community to structure scholarly content). Therefore, the infrastructure is discipline agnostic. Content comes from worldwide sources and the users are worldwide.   

🟢  Stakeholder governed

A board-governed organisation drawn from the stakeholder community builds more confidence that the organisation will take decisions driven by community consensus and consideration of different interests.

Europe PMC’s governance includes a Scientific Advisory Board and Funder Committee. The Scientific Advisory Board includes representative members of the open science community, including researchers, publishers, funders and text miners.

Europe PMC’s Funder Committee includes representatives of the 34 international funders who support Europe PMC. Current members of these groups can be found on the Europe PMC Governance page

🟢 Non-discriminatory membership

We see the best option as an “opt-in” approach with a principle of non-discrimination where any stakeholder group may express an interest and should be welcome. The process of representation in day to day governance must also be inclusive with governance that reflects the demographics of the membership.

We have assumed for the purposes of this principle that ‘members’ of Europe PMC are funders and we operate a principle of non-discrimination and any members are welcome. There are two criteria to join Europe PMC as a funder. Funders must be: 

  • Public organisations whose legal mission is research funding, or;
  • not-for-profit organisations (charities, foundations, associations) that fund research as a part of their mission.

Europe PMC also provides various services that anyone in the community can engage with, such as external links, preprints ingest and the annotations submission service. Use of these services is non-discriminatory but there are required criteria for participation. For example, preprints must meet certain minimum metadata standards and external links should be open and free to access and add value to the content.

🟢 Transparent operation

Achieving trust in the selection of representatives to governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).

The Europe PMC Funder Committee terms of reference define the process and criteria for new funders joining. 

Europe PMC receives external grant funding and in-kind support from EMBL. Details of Europe PMC funding, which is provided by 34 funders of life science research and coordinated by Wellcome Trust, are available via Europe PMC’s grant finder. Details of how EMBL-EBI is funded are provided in the annual report.

Europe PMC undertakes user research and outreach activities on an ongoing basis to identify community and user needs. These are translated into new features and improvements, as appropriate. Europe PMC publishes a quarterly roadmap and in addition has published anonymised user research findings on Figshare.

🟢 Cannot lobby

The community, not infrastructure organisations, should collectively drive regulatory change. An infrastructure organisation’s role is to provide a base for others to work on and should depend on its community to support the creation of a legislative environment that affects it.

Europe PMC responds to regulatory changes, but does not lobby for them, or drive them. Europe PMC supports changes to funders’ open access policies. We contribute to the development of community standards, for example preprint metadata standards. Europe PMC responds to the needs of its community, for example by ingesting COVID-19 preprints and grants.

🟡 Living will

A powerful way to create trust is to publicly describe a plan addressing the condition under which an organisation would be wound down, how this would happen, and how any ongoing assets could be archived and preserved when passed to a successor organisation. Any such organisation would need to honour this same set of principles.

EMBL is an intergovernmental organisation, powering world-class research through tools, data and facilities. EMBL was established nearly 50 years ago in 1974, and is currently funded by 27 member states. An integral part of EMBL, EMBL-EBI has 25 years of experience in ensuring continuity of access to public infrastructure and data. EMBL-EBI manages the graceful retirement of public data services as part of its portfolio management, exemplified by the retirement of ArrayExpress archive, which has been seamlessly superseded by the BioStudies database in 2021.

Europe PMC is an archive. It aggregates, preserves, and enriches content posted or published elsewhere. The majority of the content in Europe PMC is available via partner databases and open infrastructure providers, including MEDLINE, PMC, Agricola, Crossref, and many others. The data that enriches this content includes text-mined annotations, grant information for Europe PMC funders, links to external resources, and other value-added services. Annotations are in W3 annotations format and external links are in Scholix format and are available via APIs. An end of life deposit of the available annotations and external links could be deposited to a repository for preservation. 

Europe PMC’s partnership with PMC USA ensures that if Europe PMC was to wind down, the existing abstracts and PMC full text content (including author manuscripts supported by Europe PMC funders) would still exist in another location, as demonstrated by the retirement of PMC Canada. The full text Preprints available in Europe PMC but not elsewhere would be made available as an end of life deposit to a repository for preservation.

🟢 Formal incentives to fulfil mission & wind-down

Infrastructures exist for a specific purpose and that purpose can be radically simplified or even rendered unnecessary by technological or social change. If it is possible the organisation (and staff) should have direct incentives to deliver on the mission and wind down.

As mentioned above, EMBL-EBI has experience and a history of responding where the community needs change. 

Europe PMC is the repository of choice for its 34 funders, enabling their researchers to self-archive their author accepted manuscripts to comply with the funders’ Open Access policies. The landscape of Open Access publishing is changing (driven, for example, by the  Plan S initiative). The need for Europe PMC is continually tested through the 5-year cycle of grant renewal and continuous monitoring of use. As Europe PMC is hosted by the EMBL-EBI as part of a portfolio of data services, its lifecycle is managed in the same way as resources, such as BioStudies, Ensembl, Uniprot and many others.

Sustainability

🟡 Time-limited funds are used only for time-limited activities

Day to day operations should be supported by day to day sustainable revenue sources. Grant dependency for funding operations makes them fragile and more easily distracted from building core infrastructure.

Europe PMC is funded by a combination of a grant (contributed to by 34 funders) and EMBL-EBI support. It has been running for 15 years and has grant funding secured until 2026. As a general rule, EMBL-EBI supports the core infrastructure and the grant supports both the maintenance of Europe PMC services and the development of new ones. Further project-specific grants may fund aligned developments that enrich Europe PMC, but the main Europe PMC service is not dependent on these for its core service delivery.  

🔴 Goal to create surplus

Organisations which define sustainability based merely on recovering costs are brittle and stagnant. It is not enough to merely survive, it has to be able to adapt and change. To weather economic, social and technological volatility, they need financial resources beyond immediate operating costs.

As a publicly funded service, Europe PMC does not create surplus funds. Rather, Europe PMC’s resilience is ensured in the broader context of EMBL-EBI and EMBL. 

🔴 Goal to create contingency fund to support operations for 12 months

A high priority should be generating a contingency fund that can support a complete, orderly wind down (12 months in most cases). This fund should be separate from those allocated to covering operating risk and investment in development.

As described above, EMBL’s commitment to Europe PMC effectively provides contingency but not in the form of a specific 12-month fund. 

🟢 Mission-consistent revenue

Potential revenue sources should be considered for consistency with the organisational mission and not run counter to the aims of the organisation. 

Europe PMC sits within the wider open data mission of EMBL-EBI. In that context, there are processes that assess grant applications to provide assurance that they are aligned with the organisational and resource missions.

🟢 Revenue based on services, not data

Data related to the running of the research enterprise should be a community property. Appropriate revenue sources might include value-added services, consulting, API Service Level Agreements or membership fees.

Europe PMC does not charge for use. A 2021 impact report by independent consultancy Charles Beagrie Ltd and 2019 report by Technopolis found that EMBL-EBI open data resources, of which Europe PMC is one, offer exceptional value for money. 

Insurance

🟡 Open source

All software required to run the infrastructure should be available under an open source license. This does not include other software that may be involved with running the organisation.

Only part of Europe PMC’s software is open source. Some parts of Europe PMC’s software dates back 20 years. As Europe PMC components are refactored and replaced over time, they are typically replaced with open source software. For example, since 2019, the Europe PMC plus manuscript submission system has been developed using PubSweet, a free, open source framework developed in collaboration with the Collaborative Knowledge Foundation (Coko) and community partners, including eLife and Hindawi. 

All of Europe PMC’s open source software is available via Europe PMC’s public-projects Gitlab repository

In December 2021 EMBL released its internal policy on open science and open access which states:

“EMBL expects all of the above types of software to be Open Source by default in both services and research, and made available in open/community software repositories.”

Europe PMC is committed to increasingly making its software open source. 

🟢 Open data (within constraints of privacy laws)

For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible.

EMBL-EBI has at its core a mission to deliver open data. EMBL-EBI’s principles of service provision state: 

“Our data and tools are freely available, without restriction. The only exception is potentially identifiable human genetic information, for which access depends on research consent agreements.”

A public executive summary and further details for licensing of EMBL-EBI data resources is available on the EMBL-EBI webpage. Details on general copyright restrictions are available for Europe PMC because some content is protected by publisher copyright statements. Articles and other material in Europe PMC usually contain an explicit copyright statement.

🟢  Available data (within constraints of privacy laws)

It is not enough that the data be made “open” if there is not a practical way to actually obtain it. Underlying data should be made easily available via periodic data dumps.

Article, grant and annotations metadata in Europe PMC can be accessed freely with no registration requirement for the use of the website, APIs and the bulk download via FTP. Europe PMC provides an open access subset of full-text articles and COVID-19 preprints with more permissive Creative Commons type licenses (or similar), which is available using the APIs or as bulk downloads.

🟡  Patent non-assertion

The organisation should commit to a patent non-assertion covenant. The organisation may obtain patents to protect its own operations, but not use them to prevent the community from replicating the infrastructure.

Europe PMC does not hold any patents and we can not foresee any circumstances under which we would apply for one. In the event that we did need to apply for a patent, we would be willing to issue a patent non-assertion covenant to assure stakeholders and the community that we would not use a patent to prevent the community from using Europe PMC code or infrastructure.

Conclusion

As seen in the self-audit above, Europe PMC is committed to providing open data supported by high quality, sustainable, open and community driven infrastructure. This audit demonstrates some areas in which Europe PMC could improve, for example increasing our open source code proportion. However, it would be valuable to work with the POSI authors and other infrastructure providers in the community to revise and adapt POSI to ensure it is appropriate for publicly funded services and organisations.  

2 comments on "Europe PMC adopts the Principles of Open Scholarly Infrastructure"


Federico says:

Good to hear about the plan to gradually replace proprietary software with open source software, like PubSweet. Is there a plan to replace the proprietary JavaScript currently loaded from third parties and executed on EuropePMC's users' devices, like Scite?

Thank you for your question Federico. For Scite integration specifically we have integrated Scite API to retrieve citation data, however we have implemented our own Javascript to present the data on Europe PMC website.

Post a comment


I agree to the limited use of my personal data as described in the Europe PMC advanced user services privacy policy.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Subscribe to the Europe PMC News blog to receive the latest updates

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Partnerships & funding

Europe PMC is a service of the Europe PMC Funders' Group, in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI); and in cooperation with the National Center for Biotechnology Information (NCBI) at the U.S. National Library of Medicine (NCBI/NLM) . It includes content provided to the PubMed Central (NLM/PMC) archive by participating publishers.