When SocArXiv gets bad papers

Detail from AI-generated art using the prompt “bad paper” with Wombo.

Two recent incidents at SocArXiv prompted the Steering Committee to offer some comment on our process and its outcomes.

Ivermectin research

On May 4, 2021, our moderators accepted a paper titled, “Ivermectin and the odds of hospitalization due to COVID-19: evidence from a quasi-experimental analysis based on a public intervention in Mexico City,” by a group of authors from the Mexican Social Security Institute, Ministry of Health in Mexico City, and Digital Agency for Public Innovation in Mexico City. The paper reports on a “quasi-experimental” analysis purporting to find “significant reduction in hospitalizations among [COVID-19] patients who received [a] ivermectin-based medical kit” in Mexico City. The paper is a “preprint” insofar as the paper was not peer reviewed or published in a peer-reviewed journal at the time it was submitted, but because it has not subsequently been published in such a venue, it is really just a “paper.” (We call all the papers on SocArXiv “papers,” and let authors describe their status themselves, either on the title page, or by linking to a version published somewhere else.)

Depending on which critique you prefer, the paper is either very poor quality or else deliberately false and misleading. PolitiFact debunked it here, partly based on this factcheck in Portuguese. We do not believe it provides reliable or useful information, and we are disappointed that it has been very popular (downloaded almost 10,000 times so far).

This has prompted us to clarify that our moderation process does not involve peer review, or substantive evaluation, of the research papers that we host. From our Frequently Asked Questions page:

Papers are moderated before they appear on SocArXiv, a process we expect to take less than two days. Our policy involves a six-point checklist, confirming that papers are (1) scholarly, (2) in research areas that we support, (3) are plausibly categorized, (4) are correctly attributed, (5) are in languages that we moderate, and (6) are in text-searchable formats (such as PDF or docx). In addition, we seek to accept only papers that authors have the right to share, although we do not check copyrights in the moderation process. For details, view the moderation policy.

Posting a paper on SocArXiv is not in itself an indication of good quality. We host many papers of top quality – and their inclusion in SocArXiv is a measure of good practice. But there are bad papers as well, and the system does not explicitly differentiate them for readers. In addition to not verifying the quality of the papers we host, we also don’t evaluate the supporting materials authors provide. In the case of the ivermectin paper, the authors declared that their data is publicly available with a link to a Google sheet (as well as a Github repository that is no longer available). They also declared no conflict of interest.

We do not have a policy to remove papers like this from our service, which meet submission criteria when we post them but turn out to be harmful. However, we could develop one, such as a petition process or some other review trigger. This is an open discussion.

Fraudulent papers

To our knowledge, the ivermectin paper is not fraudulent. However, we do not verify the identities of authors who submit papers. The submitting author must have an account on the Open Science Framework, our host platform, but getting an OSF account just requires a working email address. OSF users can enter ORCID or social media account handles on their profiles, but to our knowledge these are not verified by OSF. OSF does allow logins with ORCID or institutional identities, but as moderators at SocArXiv we don’t have a way of knowing how a user has created their account or logged in. Our submission process requires authors to affirm that they have permission to post the paper, but we don’t independently verify the connections between authors.

In short, both OSF and SocArXiv are vulnerable to people posting work that is not their own, or using fake identities. The unvarnished truth is that we don’t have the resources of the government, the coercive power of an employer, or the capital of a big company necessary to police this issue.

Recently, someone posted one fraudulent paper on SocArXiv, and attempted to post another, before we detected the fraud in our moderation process. The papers submitted listed a common author, but different (apparently) fake co-authors. In one case, we contacted the listed co-author (a real person) who confirmed that they were not aware of the paper and had not consented to its being posted. With a little research, we found papers under the name of this author at SSRN, ResearchGate, arXiv, and Paperswithcode, which also seem to be fake. (We reported this to the administrators of OSF, who deleted the related accounts.)

It did not appear that these papers had any important content, but rather just existed to be papers, maybe to establish someone’s fake identity, test AI algorithms or security systems, or whatever. Their existence doesn’t hurt real researchers much, but they could be part of either a specific plan that would be more harmful, or a general degradation of the research communication ecosystem.

With regard to this kind of fraud, we do not have a consistently applied defense in our moderation workflow. If we suspect foul play, we poke around and then reject the papers and report it if we find something bad. But, again, we don’t have the resources to fully prevent this happening. However, we are developing a new policy that will require all papers to have at least one author linked to a real ORCID account. Although this will add time to the moderation process of each paper (since OSF does not attach ORCIDs to specific papers), we plan to experiment with this approach to see if it helps without adding too much time and effort. (As always, we are looking for more volunteer moderators — just contact us!)

User responses

We do offer several ways for readers to communicate to us and to each other about the quality of papers on our system. Readers may annotate or comment on papers using the Hypothesis tool, or they may endorse papers using the Plaudit button. (Both of these are free with registration, using ORCID for identification.) If you read a paper you believe is good, just click the Plaudit button — that will tell future readers that you have endorsed it. Neither of these tools generates automatic notifications to SocArXiv or to the authors, however — they just communicate to the next reader. If you see something that you suspect is fraudulent or harmful, feel free to email us directly at socarxiv@gmail.com.

We encourage readers to take advantage of these affordances. And we are open to suggestions.

SocArXiv policy on withdrawing papers

The Center for Open Science has added withdrawal functionality to its preprint service platform. We are glad to have this capacity, but we will not be permitting the withdrawal of papers in routine cases. Withdrawing is a convenient option if an author makes an error in the submission process, for example accidentally submitting the wrong version; if a paper has not yet been approved, we are happy to accommodate such requests. However, if a paper has already been accepted, and thus entered the scholarly record, we will follow the policy below.

Unfortunately, authors now see a large “Withdraw Paper” button on the page where they edit their paper entries. We are working with COS to change how this option is presented to authors, and also to make users aware of our policy. Posting a paper on SocArXiv is easy, which brings great benefit to the thousands of people who have shared their work. However, authors should be aware that posting papers is generally nonreversible. We offer this policy and its explanation to help further this understanding.

Dog leaping fearlessly off a dock into water
Photo by Emery Way https://flic.kr/p/5JMYz7

SocArXiv Withdrawal Policy

May 25, 2019

In case of revision, the current version will be found here.

The Center for Open Science (COS), which hosts SocArXiv, has enabled the withdrawal of papers from its paper services. Authors who wish to withdraw their papers may request a withdrawal from the SocArXiv moderators, according to the terms of this policy.

Permission for withdrawal will only be granted in the very rare circumstance in which we have a legal obligation to remove a paper, such as if it contains private personal information or it is subject to a substantiated copyright claim. In cases where a paper is withdrawn, it will be replaced by a “tombstone” page (here is an example), which includes the original paper’s metadata (author, title, abstract, DOI, etc.), and the reason for withdrawal. After that point, the paper will be locked to further modification.

Papers that infringe on copyrights will be removed in accordance with the Digital Millennium Copyright Act, under the Center for Open Science terms of use, available here.

If authors wish to withdraw papers for other reasons — for example, if they are not confident of the findings or otherwise no longer endorse the paper — they should post a new “version” of the paper that is a single page announcing the withdrawal. They may, for example, request that readers do not further cite, use, or distribute previous versions (which will remain available under the list of previous versions). Instructions on how to post a new version are available here; we are happy to help authors do this.

This policy is very similar to the retraction of an article by an academic journal, which only rarely involves removal of access to the original paper, instead generally relying on a notification of retraction in its place.

Instructions for request a withdrawal are available here: http://help.osf.io/m/preprints/l/1069374-withdrawing-a-preprint

Why doesn’t SocArXiv let authors decide when to withdraw a paper?

Papers on SocArXiv are part of the scholarly record. Upon being posted, they are given a Digital Object Identifier (DOIs), and a persistent URL from COS. The link is automatically tweeted by our announcement account, and the system also generates a citation reference. The document is immediately citable and retrievable by human or machine agents. In short, posting a paper on SocArXiv is a research event that cannot be undone by deleting the document. Preserving the scholarly record is our obligation to the scholarly community.

Authors who post papers on SocArXiv are notified, at the final point of submission, that they will be “unable to delete the preprint file, but [they] can update or modify it.” Authors also are required to confirm that all contributors have agreed to share the paper, and that they have the right to share it. (All co-authors have the same rights to distribute a copyrighted work, unless a subsequent agreement has intervened, so an objection to the posting by a co-author is not the basis for removal.)

The Internet has made it possible to distribute work without relinquishing the original digital file, which makes it possible to delete the version readers access — a privilege that was not available when research was distributed in printed form. However, the Internet has also made it difficult or impossible to remove all traces or copies of a digital document. This is a challenging environment for authors.

We are sympathetic to the desire of some authors to remove copies of their earlier work from circulation, for a variety of reasons, and we appreciate that our policy may cause frustration. We hope authors will carefully consider it before they post their work.

Our policy is very similar to that employed by the older and more established preprint servers, arxiv and bioRxiv.

bioRxiv’s FAQ page reads:

Can I remove an article that has already posted on bioRxiv?

No. Manuscripts posted on bioRxiv receive DOI’s and thus are citable and part of the scientific record. They are indexed by services such as Google Scholar, Microsoft Academic Search, and Crossref, creating a permanent digital presence independent of bioRxiv records. Consequently, bioRxiv’s policy is that papers cannot be removed. Authors may, however, have their article marked as “Withdrawn” if they no longer stand by their findings/conclusions or acknowledge fundamental errors in the article. In these cases, a statement explaining the reason for the withdrawal is posted on the bioRxiv article page to which the DOI defaults; the original article is still accessible via the article history tab. In extremely rare, exceptional cases, papers are removed for legal reasons.

At this writing, just 32 out of 50,401 preprints on bioRxiv have been withdrawn, a rate of 6 per 10,000.

On arXiv, the instructions read:

Articles that have been announced and made public cannot be completely removed. A withdrawal creates a new version of the paper marked as withdrawn. That new version displays the reason for the withdrawal and does not link directly to the full text. Previous versions will still be accessible, including the full text.

On the other hand, at least one paper service, Elsevier’s SSRN (formerly the Social Science Research Network), allows authors to delete their papers from their repository immediately for any reason (FAQ). Similarly, some authors choose to distribute their work on their own websites, where they have more complete control over the contents. We believe these approaches put the needs of the author of over those of the research community. While a reasonable choice in some cases, this represents a philosophy different from ours.

We want an open, equitable, inclusive scholarly ecosystem in which people are free to share and use information as freely as possible. We have created this policy to serve that goal.