Wednesday, 1 December 2021

Transparency for preprints: handling withdrawals and removals

Part of the appeal of preprints is the ability to post new versions, allowing researchers to continuously improve their manuscript and correct it if needed. However, in some cases the data or its interpretation presented in the preprint may be proven incorrect with time. In such cases the authors may wish to withdraw or remove the preprint, rather than posting another version. There could also be instances where preprints are removed for legal reasons, due to authorship disputes, or even as a result of erroneous posting. 

Currently, there are several different ways in which preprint platforms handle such scenarios. In the case of a withdrawal the preprint itself is often still accessible, but it is supplemented with a new version containing a withdrawal notice, which explains that the preprint should not be considered part of the scientific record. This is akin to retractions for peer reviewed journal articles. On the other hand, in the case of a removal all preprint versions are removed and the content is no longer accessible, in some cases with a removal notice replacing the preprint itself. You can see the list of different withdrawal/removal policies in the ASAPbio Preprint Server Directory.

For an archive, such as Europe PMC, it is crucial to follow best practices for handling preprint metadata to enable transparency and build trust in preprints. As a proof-of-concept we now provide a way to search and display withdrawn and removed preprints with appropriate labels for the COVID-19 full text preprint subset. We identify preprint withdrawal or removal notices based on document length (notices are often just a single sentence long) using the Europe PMC plus submission system. Those records are then flagged, manually checked and tagged with the appropriate withdrawal or removal article-type [Hamelers A, Parkin M. A full text collection of COVID-19 preprints in Europe PMC using JATS XML].


Withdrawn and removed COVID-19 full text preprints in Europe PMC can be found using PUB_TYPE:"preprint-withdrawal" and PUB_TYPE:"preprint-removal" searches, respectively. Information about preprint withdrawals and removals can also be obtained programmatically, with an option to retrieve preprint status changes via the new status update search module of the Europe PMC Articles API. 

A screenshot of search results for PUB_TYPE:"preprint-withdrawal" demonstrates that, to date, there are 34 full text COVID-19 related preprints in Europe PMC that have been withdrawn from the corresponding preprint platform.

Such records are clearly labeled on the preprint page with a link back to the preprint server for more information. 

Examples of a withdrawn (left) and removed (right) full-text COVID-19 preprint record in Europe PMC.

A notice is also displayed on earlier versions, with the exception of preprints from servers that use a single DOI for all versions and overwrite the metadata (e.g. medRxiv and bioRxiv). In such cases a withdrawal notice replaces the preprint record.

Example of the notification for an earlier version of a full-text COVID-19 preprint in Europe PMC, where a later version is a withdrawal notice.

While the permanence of preprint records is important to support the legitimacy of preprints as scholarly outputs, in some cases preprints are removed entirely, with the preprint URL leading to a 404 page without a notification present. In such cases, removed preprints are deleted from Europe PMC as well.

Ideally, all withdrawn or removed preprints in Europe PMC, including those that we do not have the full text for, should be clearly identified, stating the reason for the withdrawal. However, currently there is no straightforward way to retrieve this information for preprints posted on different platforms. We rely on manual analysis of the flagged preprint full text to discover that it contains a removal or withdrawal notice, with a follow-up manual check on the preprint server. In the future, we hope that preprint servers will share the withdrawal/removal status in a machine readable format, potentially through a single service, such as Crossref, which would allow us and other providers to automate updates to the preprint record.

Thursday, 11 November 2021

Alerts for topics, authors, preprints, and more

Keeping up with the current published research has long been a challenging task. With preprints added to the mix, research is shared much quicker, extending the reading list and fuelling information overload.

There is a need for useful tools to help researchers stay on top of the new discoveries that are most relevant to their field. Europe PMC now offers a new alerts feature for keyword searches. 

How can alerts help track relevant research?

Researchers can create email alerts for saved searches, to discover new articles on a topic of interest. Europe PMC alerts can help you:

How does an alert work?

When creating an alert you will be asked to sign in with an ORCID, Twitter, or Europe PMC/Europe PMC plus account. You can access all of your saved alerts from your account. Alerts can be modified or deleted at any time. 

Example of saved alerts in a user account.

The Save & create alert form can be accessed via the Save & create alert button next to the search bar. Alert settings options include:

- You can edit your search and test the new search terms.

- You can edit the name of the alert.

- You can choose to receive regular email updates or to save the search to run yourself at any time.

- You can select the frequency option from as soon as available, weekly, or monthly. You will need to select a preferred day for weekly and monthly alerts. Any new results are emailed at approx. 06:00 GMT.  

- You can opt to receive a partial abstract with the first few lines included in the email.


When receiving an email for your alert, it will include the title, author(s), journal information and a few lines of the abstract, if selected, for up to 25 new results, with an option to view all new results on Europe PMC if more items are found. 

Example of an alert email for (bacteriophage) search on Europe PMC.

We hope that the new alerts feature for Europe PMC will make it easy to find the relevant information and stay on top of published and preprinted research. If you have any feedback regarding the email alerts please let us know via helpdesk@europepmc.org.

Monday, 25 October 2021

Using the Europe PMC REST API to Study Open Access Publishing at the University of Virginia

In May 2021, the University of Virginia Faculty Senate passed a set of Open Access Guidelines and Recommendations. This was the first university-wide resolution on the subject, though the School of Data Science adopted its own open access policy a few months earlier. At the Claude Moore Health Sciences Library, we primarily serve the School of Medicine, the School of Nursing, and the Medical Center (collectively known as UVA Health). Though we now knew that the majority of the university’s faculty supported open access, we wanted to learn more about the attitudes of our library’s constituents in the health sciences.

We decided to create a dashboard to visualize what proportion of recent journal articles published by UVA Health authors were open access. This would help us to better understand the current publication landscape as well as identify particular authors who consistently published in open access journals, and therefore might be worth contacting as potential partners for outreach on the topic. UVA has institutional subscriptions to commercial software that measures research impact but the license terms for those products restrict how their data can be shared, which would limit the possibilities for future development of our project. To align with the larger project’s goal of encouraging open behaviours, we wanted to create the dashboard using only open data sources. Additionally, using open data sources and sharing our code would allow anyone to replicate it or adapt it to study their own institution’s publications. 

To restrict our search to UVA Health authors, we needed an API that included institutional affiliation for all authors. We initially considered other options however the Europe PMC API proved to be the best data source for this project as it focuses on biomedical publications and the core metadata option for the API includes the institutional affiliation for all authors separated into individual fields.

After deciding on a data source, we needed to find a way to process and visualize the data. We ingested the Europe PMC API core metadata in JSON into a pandas dataframe and used Streamlit to create the visualization. Streamlit is an open source Python library which can be used to create interactive web applications. The free Sharing option allows users to create up to three complete web apps with shareable URLs using code stored in a public GitHub repository. With only three files, one for infrastructure, one for python libraries, and one for code, users can configure the environment for diverse analysis projects. We used Streamlit’s built-in caching feature to improve the speed of the application and Altair data visualization tools to create the charts.

The Open Data Dashboard now shows the proportion of open access versus subscription-only articles published by UVA-affiliated authors and indexed in Europe PMC from 2017 to the present, and we have also added an exploratory data analysis widget made using the Python library Sweetviz to provide a quick visualization of the dataset and help us uncover potential areas for future analysis. 

We wanted to use the Europe PMC affiliation data to sort articles published by members of different departments or units at UVA Health, but this has proved challenging as the names are written in varying ways. Because of these risks, we have not drilled down in to the data based on department, even though this use of the data would likely be of the greatest interest to the library’s constituents.

We received assistance from numerous sources while working on this project. When we initially encountered problems in getting the full data set to load, Dr. Maaly Nassar quickly replied to our email to the Europe PMC Developer Forum and provided a piece of code which solved our problem. James Bennett at our local Code for America team recommended Sweetviz, and Randy Zwitch from Streamlit was available on Twitter for assistance setting up caching. More information on the project can be found in this blog post and our GitHub repository is available here

Please direct any questions on the Open Data Dashboard to:

Lucy Carr Jones (lgc3t[at]virginia.edu), MSIS, Library Assistant, University of Virginia Claude Moore Health Sciences Library; and Anson Parker (adp6j[at]virginia.edu), Web Developer, University of Virginia Claude Moore Health Sciences Library.

Monday, 26 July 2021

COVID-19 grants: who, where and how much?

Along with life science publications, Europe PMC offers a way to search through biomedical grants from Europe PMC funders using the Grant Finder tool. Last year, Europe PMC partnered with the Medical Research Council (MRC), the UK Collaborative on Development Research (UKCDR) and the Global Research Collaboration for Infectious Disease Preparedness (GLoPID-R), to extend its search to include COVID-19 grant data from other funding organisations.

COVID-19 grant data is retrieved monthly from the COVID-19 Research Project Tracker - a live database of funded research projects across the world related to the current COVID-19 pandemic. It is one of the most comprehensive databases, covering a wide breadth of research disciplines.

All grant information in Europe PMC is stored in a dedicated grants database, in a machine readable format. Data can be programmatically accessed using the Europe PMC GRIST API, enabling large scale analyses of the funding landscape. To demonstrate how the grant data can be used, we have gathered some insights regarding COVID-19 funding. 

As of 10.06.2021, Europe PMC has funding information for 8839 COVID-19 grants awarded to 8193 distinct PIs (Principal Investigators). This includes grants for research on coronavirus epidemiology and virology, as well as grant awards in other relevant disciplines. In the latter case, funders such as ERC provide us with a curated list of COVID-19 related awards.

COVID-19 grants are awarded by 146 international funders, indicating that the current pandemic truly is a world-wide issue. You can see the geographical distribution of the funding organisations on the map below. Circle size correlates with the number of funders based within a selected country. Funders that could not be assigned a geographical location, which accounts for 23 funding organisations, are not represented on the map. This includes international and intergovernmental organisations. 


The data shows that ~2% of all COVID-19 grants (168 of 8839 total) are jointly supported by two or more funder.

Altogether COVID-19 grant awards amount to at least $3.9 billion (monetary value is not available for 2500 (28%) of the COVID-19 grant records). Note that grants have been awarded in 20 different currencies which we have converted to USD for this calculation. The mean average COVID-19 grant award value is ~$615,000, which is still nearly a third lower than the average value of grants awarded by Europe PMC funders (~$845,000). This could be a result of COVID-19 grants being awarded with a shorter duration than the average funding call duration for Europe PMC funder grants. Looking at grants that have both start and end dates available, the average COVID-19 grant has a duration of 30 months, compared to 40 months for regular Europe PMC funding calls, representing a similar annual value (~$235,000 for COVID-19 grants and ~$247,000 for non-COVID-19). Another explanation may be that the most common discipline in the COVID-19 tracker is social sciences, where research costs are typically lower than in the medical sciences.

These are just some examples of what we can infer from the funding information available in Europe PMC. We hope that making grant data available via an open API in a structured form will lead to more insights by other research teams in the future.

Please note that we feature examples from our programmatic users on the Europe PMC API Case Studies page. If you have used the Europe PMC GRIST API and would like to share your story please get in touch via helpdesk@europepmc.org.

Tuesday, 11 May 2021

PsyArXiv preprints now indexed in Europe PMC

PsyArXiv preprints are now discoverable in Europe PMC alongside peer-reviewed research

PsyArXiv’s collection of more than 14,000 psychological sciences preprints has now been indexed in Europe PMC, an open science platform that enables access to a worldwide collection of 38.6 million life science publications and preprints.

Indexing PsyArXiv in Europe PMC means that psychological preprints will now be more discoverable than ever before. PsyArXiv users will benefit from the increased audience for their work.”  - Katherine S. Corker, executive officer for the Society for the Improvement of Psychological Science, and Associate Professor at Grand Valley State University.   

Since 2018, Europe PMC has indexed nearly 290,000 preprints alongside its collection of peer-reviewed journal articles. Similar to other records in Europe PMC, these preprints are linked to underlying data, journal published articles, preprint citations, comments or open peer reviews, and can be claimed to ORCID.

"We’ve seen an impressive rise in preprint popularity in the life sciences in the last few years – it’s a great way to rapidly communicate research results. [...] We hope that including preprint abstracts into Europe PMC search results, will not only enhance their discoverability, but also provide a community platform to explore some open questions.” - Johanna McEntyre, the Associate Director of EMBL-EBI Services and Principal Investigator for Europe PMC.

If you are interested to learn more about PsyArXiv preprints in Europe PMC, please register to join our live demo at 13:00 (GMT+1) on May 25th.

_________________________________________________________________________________________

About PsyArxiv

PsyArXiv (psychology archive) is designed to facilitate rapid dissemination of psychological research. PsyArXiv is a creation of the Society for the Improvement of Psychological Science (SIPS) and is hosted by the Center for Open Science (COS).

PsyArXiv allows scholars to post documents such as working papers, unpublished work, and articles under review (preprints), making them accessible to other researchers and to the public at no cost. Users can also upload revisions of their posted document and supplemental documents such as appendices.

About Europe PMC

Europe PMC is an open database of life science publications and preprints from trusted sources around the globe. Europe PMC supports the research community by supporting innovation in publishing, fostering reproducible science, and enabling data-driven discovery. Europe PMC’s mission is to build open, full text scientific literature resources and support innovation by engaging users, enabling contributors, and integrating related research data.

Tuesday, 9 February 2021

Over 15,300 full text COVID-19 now available in Europe PMC

In July 2020, Europe PMC began indexing the full text of COVID-19 preprints. The initiative supported by Wellcome, the UK Medical Research Council, and Swiss National Science Foundation, has now made over 15,300 full text COVID-19 preprints searchable and free to read, alongside peer reviewed articles.



Number of full text COVID-19 preprints in Europe PMC by month.


Researchers can benefit from a greater number of results being returned for each query since Europe PMC search tool searches not only the abstracts but also the full text of articles and preprints. The full text of COVID-19 preprints are available for reading and reuse via a standard XML format, so if a preprint has an open license, it can be text mined programmatically, allowing deeper analysis of the COVID-19 literature.

Which servers are indexed?

Europe PMC currently indexes full text COVID-19 preprints from biomedical servers bioRxiv, and MedRxiv.  Adding to this, Europe PMC also includes full text COVID-19 preprints from Research Square, ChemRxiv, SSRN and from arXiv.


Preprint server

Disciplinary scope

arXiv

Multiple scientific fields, including quantitative biology

bioRxiv

Broad life & biomedical research (from animal behaviour and cognition to zoology)

ChemRxiv

Subject-specific, including biological and medicinal chemistry

medRxiv

Broad medical, clinical and related health sciences

Research Square

All scientific fields

SSRN

Multiple scientific fields including applied sciences, health sciences, life sciences, social sciences

Disciplinary scope was taken from ‘Preprint server directory’ at https://asapbio.org/preprint-servers 


 

Europe PMC uses the PDF freely available on these preprint servers to create the full-text HTML version of the preprint, which is later sent for author approval to display by Europe PMC.  A single query will access 15,000 preprints from across these 6 servers.

 

Number of full text COVID-19 preprints in Europe PMC by preprint server.

 

 

 

Data re-use

One of the aims of the project is to provide a unique corpus of COVID-19 research for current and future research. The COVID-19 full text articles in Europe PMC have already been used for data analysis in the following studies:


 

Future plans

Europe PMC is planning to index full text preprints from Preprints.org, Authorea and other preprint servers and platforms which include relevant COVID-19 content. To see an updated list of preprint servers check the ‘About preprint page’ on Europe PMC website.




For more information about preprints in Europe PMC, visit our website: https://europepmc.org/Preprints