Wednesday, 1 December 2021

Transparency for preprints: handling withdrawals and removals

Part of the appeal of preprints is the ability to post new versions, allowing researchers to continuously improve their manuscript and correct it if needed. However, in some cases the data or its interpretation presented in the preprint may be proven incorrect with time. In such cases the authors may wish to withdraw or remove the preprint, rather than posting another version. There could also be instances where preprints are removed for legal reasons, due to authorship disputes, or even as a result of erroneous posting. 

Currently, there are several different ways in which preprint platforms handle such scenarios. In the case of a withdrawal the preprint itself is often still accessible, but it is supplemented with a new version containing a withdrawal notice, which explains that the preprint should not be considered part of the scientific record. This is akin to retractions for peer reviewed journal articles. On the other hand, in the case of a removal all preprint versions are removed and the content is no longer accessible, in some cases with a removal notice replacing the preprint itself. You can see the list of different withdrawal/removal policies in the ASAPbio Preprint Server Directory.

For an archive, such as Europe PMC, it is crucial to follow best practices for handling preprint metadata to enable transparency and build trust in preprints. As a proof-of-concept we now provide a way to search and display withdrawn and removed preprints with appropriate labels for the COVID-19 full text preprint subset. We identify preprint withdrawal or removal notices based on document length (notices are often just a single sentence long) using the Europe PMC plus submission system. Those records are then flagged, manually checked and tagged with the appropriate withdrawal or removal article-type [Hamelers A, Parkin M. A full text collection of COVID-19 preprints in Europe PMC using JATS XML].

Withdrawn and removed COVID-19 full text preprints in Europe PMC can be found using PUB_TYPE:"preprint-withdrawal" and PUB_TYPE:"preprint-removal" searches, respectively. Information about preprint withdrawals and removals can also be obtained programmatically, with an option to retrieve preprint status changes via the new status update search module of the Europe PMC Articles API. 

A screenshot of search results for PUB_TYPE:"preprint-withdrawal" demonstrates that, to date, there are 34 full text COVID-19 related preprints in Europe PMC that have been withdrawn from the corresponding preprint platform.

Such records are clearly labeled on the preprint page with a link back to the preprint server for more information. 

Examples of a withdrawn (left) and removed (right) full-text COVID-19 preprint record in Europe PMC.

A notice is also displayed on earlier versions, with the exception of preprints from servers that use a single DOI for all versions and overwrite the metadata (e.g. medRxiv and bioRxiv). In such cases a withdrawal notice replaces the preprint record.

Example of the notification for an earlier version of a full-text COVID-19 preprint in Europe PMC, where a later version is a withdrawal notice.

While the permanence of preprint records is important to support the legitimacy of preprints as scholarly outputs, in some cases preprints are removed entirely, with the preprint URL leading to a 404 page without a notification present. In such cases, removed preprints are deleted from Europe PMC as well.

Ideally, all withdrawn or removed preprints in Europe PMC, including those that we do not have the full text for, should be clearly identified, stating the reason for the withdrawal. However, currently there is no straightforward way to retrieve this information for preprints posted on different platforms. We rely on manual analysis of the flagged preprint full text to discover that it contains a removal or withdrawal notice, with a follow-up manual check on the preprint server. In the future, we hope that preprint servers will share the withdrawal/removal status in a machine readable format, potentially through a single service, such as Crossref, which would allow us and other providers to automate updates to the preprint record.

