As 2024 begins we reflect on the achievements our team made in 2023 to meet your needs as users. The team’s efforts were concentrated on building trust in preprints, open sourcing code, as well as expanding and improving text-mining capabilities.
Preprint highlights
As part of Europe PMC’s commitment to accessible and discoverable scientific research, 2023 saw the inclusion of five new preprint platforms: AIJR Preprints, PaleorXiv, Qeios, AfricArXiv, and ScienceOpen Preprints. As of December 2023, there are over 707,000 preprints from 33 preprint servers freely discoverable in Europe PMC.
The COVID-19 pandemic created an urgent need to transform the traditional publication route and allow immediate sharing of research results. This fueled the use of preprints within the biomedical community. To support the discovery and visibility of COVID-19 preprints, the full text of preprints from 10 different platforms were made searchable in Europe PMC alongside peer reviewed articles. The full text of appropriately licensed preprints is also made available in a machine-readable format for programmatic access and deeper analysis of the data. With the end of project funding, COVID-19 preprints posted on or after 1 November 2023 are no longer made available as full text in Europe PMC. The existing COVID-19 full-text preprint corpus of nearly 78,000 preprints remains freely available through the Europe PMC website and APIs. Europe PMC continues to support preprint discovery with the inclusion of full text preprints supported by Europe PMC funders, a project started in 2022.
The increased use of preprinting and preprint peer review in the life sciences is changing the landscape of the scientific publication process. To sustain this growing ecosystem, Europe PMC, in collaboration with ASAPbio, organised a workshop on ‘Supporting interoperability of preprint peer review metadata’. Following this workshop, participants formed working groups to define best practices and community standards aimed at improving the accessibility and discoverability of preprint peer review. If you would like to be a part of this ongoing conversation please register your interest here: https://forms.gle/g3FATYf57ASdQTUB9.
As part of Europe PMC’s continued work to increase transparency and build trust in preprints, links to peer reviews are now discoverable in the Reviews & evaluations section of the preprint page. Europe PMC displays the review timeline, platform and type, as well as reviewer(s) name(s) if available. Review status is clearly displayed for each preprint and its versions on the preprint page. In addition, users can now find nearly 10,000 reviewed preprints in Europe PMC with a dedicated search filter. New designs and features are based on extensive user research carried out in 2023.
Expanding and enhancing text-mining
The data science team at Europe PMC has developed a gold standard Annotations Corpus, one of the largest human-annotated biomedical corpora available publicly. This dataset can be used to improve the accuracy and reliability of life science natural language processing tools, which in turn can support scientific advancements. For example, Open Targets use machine learning models trained on the Europe PMC Annotations Corpus to identify and prioritise drug targets.
Europe PMC identifies data cited in publications by text-mining accession numbers from over 40 different life science databases to support data reuse. In 2023 this process was expanded to include data citations for AlphaFold, BRENDA, Cellosaurus, Rhea, and BioImage Archive. Europe PMC’s efforts contribute to the ambitious project that aims to build the Open Global Data Citation Corpus led by the Wellcome Trust, Chan Zuckerberg Initiative, and DataCite. This corpus will be a central aggregate for research data citations to help monitor scholarly impact and improve dissemination of research.
Europe PMC has improved the accuracy of the text-mining pipeline that identifies biological terms, including Genes/Proteins, Diseases, Organisms, Chemicals, and Drugs. The former dictionary-based approach is now paired with a machine learning filter that removes false positives and improves the reliability of annotations. Annotated key terms can be highlighted in the article text using the SciLite tool.
The text-mining process has been expanded to identify key bioentities mentioned above in supplementary files. Resulting annotations are available via the SciLite tool and the Europe PMC API. We anticipate that this expansion of text-mining will help unearth critical information hidden in supplemental data files to support data discovery and curation.
Open source code
In 2022 Europe PMC stated a commitment to open source software as part of its support for the Principles of Open Scholarly Infrastructure. To that extent we have made source code for 19 projects available as open source software, including text-mining API and dictionaries, machine learning models, a parser for preprint review DocMap files, and so on. This enabled us to promote software reuse and develop new collaborations, for example, to extract relevant information from patents with ChEMBL, enrich biological models with BioModels, and uncover protein function with PDBe.
User-focused design
User research highlighted the need to easily discover free full text for articles not available in Europe PMC. To address this, new search filters for “Full text in Europe PMC” and “Link to free full text” have been released, with the latter including articles with a free, legal copy of the full text available from another source via Unpaywall.
Another area for user-led improvement was the addition of a combined title/abstract search field (TITLE_ABS:) to the Advanced search. Europe PMC already offers tools to search within specific article sections, such as Figures, or Methods. The new search field allows simultaneous searches for keywords within titles or abstracts to support medical librarians and systematic reviewers.
Final remarks
Europe PMC is a free and open database that keeps users at the heart of our innovation and focuses on meeting the scientific community’s needs. We are proud of our achievements in 2023 and looking forward to the new developments in 2024. To find in-depth descriptions of the newest Europe PMC developments please read our recent publication: Europe PMC in 2023. To keep up to date with what we are working on in 2024, please take a look at the Europe PMC Roadmap.