News blog

Updates from Europe PMC, a global database of life sciences literature

Maria Levchenko

 | 25 November 2024

 | 3 MINS READ

 | Editor's pick

How we built a database of preprints


Compared to preprints, publishing is like waiting for your favourite band to drop a new album. Whether it takes months or years, it can feel like an eternity. Preprints are a scientific equivalent of a surprise single release – an early version of a research paper prior to peer review. Born in the 1960s in the physics domain, preprints gave researchers a way to share their work with the world in real time. 

But finding preprints can be like searching for a needle in a haystack. There are over 60 preprint servers in the life sciences alone. Search tools often don’t cover them all and aren’t always built to last. Both Microsoft Academic and Meta pulled the plug in recent years. This made us realise that there was a gap to close and that we were well-positioned to do it. Europe PMC is a database of life science publications. This lends us the technical expertise to gather preprints from different sources – kind of like a Spotify for research outputs. In July 2018, more than 6 years ago, preprints first appeared in Europe PMC alongside journal articles. We have learned a lot of lessons since and are very excited to share these lessons in our recent publication. For the full article, head here: Enabling preprint discovery, evaluation, and analysis with Europe PMC. Below is a short summary.

How did we do it?

From the start, we wanted to offer a single search across different servers. Currently you can access over 860,000 preprints from 30+ platforms on Europe PMC and we continue to add preprints from new providers that meet a set of criteria. In particular, we look at policies around screening, plagiarism, and misconduct policies.

Preprints are a lot more dynamic compared to journal articles. This prompted us to track changes to preprints; linking different versions, linking preprints to journal articles, and recording when preprints are removed or withdrawn. In addition to displaying these changes on the preprint page we also created a special tool to check for these updates – the Article Status Monitor. Try it out yourself!

It was important for us that preprints can be used the same way as other research papers so we made it easier to cite preprints, or claim them to your ORCID profile. We also included preprints in our search along with journal articles. To support systematic reviews of preprints we enabled specialised filters and full text search.

A common caution when reading a preprint is that you are on your own to examine the validity of the science. So, to support readers we have added trust signals to preprint pages. We highlight which funders supported this work, point to underlying data, and signpost existing reviews and comments.

Remember, humans are not the only readers in this age of technological advances. So to support the machines accessing preprints we offer various APIs (Application Programming Interfaces) and bulk downloads. Preprint abstracts and available open access full text are served in a standard, machine-readable format. This makes it easier for others to repurpose, training the next generation of AI models or creating new tools. Check out FlyBase search for fruit fly preprints (click on References at https://flybase.org/) or StemJournal search for stem cell preprints (https://stemjnl.org/preprint-search). Both are powered by Europe PMC.

What did it achieve?

When we started, our ambition was to make preprints easier to find. Since then we realised that’s not the only thing we’ve achieved. Adding preprints to Europe PMC opened up new possibilities for meta-research, helped build trust with preprint readers, and gave a boost to innovative publishing and review models. 

It was a bumpy road and we hit many potholes, from license limitations, to limited data and lack of programmatic access to preprint sources. But we continued working together with the whole community, pushing for small changes that can create big ripples in the preprint world and the whole publishing ecosystem. It’s no longer about getting preprints noticed. We want to revolutionise how they fit into the research ecosystem, and we have so many ideas in our arsenal. Read more about our journey and future plans in the original article.

Tags:

Post a comment


I agree to the limited use of my personal data as described in the Europe PMC advanced user services privacy policy.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Subscribe to the Europe PMC News blog to receive the latest updates

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Partnerships & funding

Europe PMC is a service of the Europe PMC Funders' Group, in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI); and in cooperation with the National Center for Biotechnology Information (NCBI) at the U.S. National Library of Medicine (NCBI/NLM) . It includes content provided to the PubMed Central (NLM/PMC) archive by participating publishers.