Where should SocArXiv draw the AI line?

*Ducks swimming right, Geese swimming left. (PNC photo)*

Someone submitted a paper to SocArXiv that we would have accepted a few years ago.

By our moderation policy, we apply only a very minimal quality standard. In addition to required structural elements — like an abstract, cited references, a title that reflects the content, ORCID, etc. — we sometimes reject papers that don’t surpass “a minimal standard of informative value.” But this paper would have passed it. It was boring, unoriginal, and superficial. Its literature review was deficient. What it claimed as an original theoretical insight was not interesting. It had some complex statistical models, apparently done competently, and graphical as well as tabular results. The citations and in-text quotations appeared to be real. As a whole, it was coherent and relevant to existing research.

At the end of the paper the author included an “AI disclosure.” They listed several AI tools used to generate code, conduct the literature search, “consult” on statistics, and draft the text — they admitted that almost all of the text was generated by these tools. But the author claimed to have formulated the research question, divined the theoretical framework, chosen variables and model specification, “directed analytical decisions,” interpreted the results, and verified every data claim, as well as every citation and quotation. They also shared the statistical code in a public repository, and offered an AI methodology audit on request.

In other words, to reject this paper, we would have to do it based on the nature and extent of the AI tools. As we attempt to formulate a policy for this, I find this case interesting.

I have my own biases. If you told me your only use of AI was to generate your statistical code, I think I would accept your paper (especially if you shared the code). Likewise if you had used AI tools to conduct categorical coding of qualitative data, provided it was human directed and verified. Also, if you told me you only used AI tools to help with writing — fixing style and grammar, language translation, helping to come up with a title or abstract — I think I would accept the paper. And if you told me you used AI to help with your literature search, such as by conducting natural language queries, I think I would accept the paper. But all of these, and writing the first draft, too?

So this paper stands out for using AI tools to do all of this, plus drafting the original text.

One clear position is that using such tools at all is unethical. The models all use people’s work without attribution. I am not persuaded by this, because I think all knowledge is learned from someone else. We have norms for attribution which are partly about ethics, and partly about validity, but there is no standard that says everything you read must be cited. However, these norms are complicated and subject to adaptation, so I don’t rule out changing my mind.

On legality, I think AI training models in principle may be practicing fair use. But it would be a copyright violation if their outputs end up displacing income from the original producers, however — as seems to be the case for news organizations whose content is served to chatbot subscribers. Obviously, I’m not expert on the legal issues, but I’m also not in charge of enforcing copyright law.

Another argument is that platforms like SocArXiv need to defend the scholarly ecosystem from slipping into a self-referencing death spiral of AI slop research generated from AI slop ad infinitum. This might especially be the case for a platform like ours, which accepts work without peer review but assigns DOIs and other trappings of scholarly legitimacy. If you are building a training model to write social science papers, SocArXiv papers would seem to be an attractive (free) target for harvesting.

On the other hand, if we attempt to ban AI-generated research — or even work with limited AI-generated components — we will be entering an endless arms race that we will ultimately lose. In the process, we will spend all our money and time trying to defeat global monopoly powers instead of helping real researchers archive and disseminate their research, which is our mission. And — as I remind people as often as I can — no one should be looking at the corpus of SocArXiv work as a repository of the best research in any field. Most humans come to us with a specific link to a paper, or an author, and get what they need. There is a lot of bad work on our platform — which, unlike most journals and even some preprint servers, we are not shy about admitting — because it doesn’t hurt the good work that is here, and we’re not trying to make money at this. Unless we get so overwhelmed with slop that we can’t maintain the service, I think that if it’s easier to accept bad work than it is to reject it, accepting it might be the more practical course.

Even a requirement like author disclosure of AI tool use could be crippling, because we don’t have the resources to verify claims, or sleuth out people who make false claims or deny using chatbots when they actually do, and so on. ChatGPT et al. read the rules we write, and will happily help authors pretend to comply with them. Again, arms race.

We have been discussing this at SocArXiv, but have not finalized our policy. When we do, I will link it here. In the meantime, we welcome your feedback, ideas, and suggestions — in the comments, or in email to socarxiv@gmail.com, or any other (peaceful) way. (Human-generated, please.) I would especially appreciate discussion that recognizes there are good people with different perspectives and experience, and it would be great if we could find a way to work together.

–Philip N. Cohen

SocArXiv moratorium on AI-topic papers, policy in formation

In light of record submission rates and a large volume of AI-generated slop papers, SocArXiv recently implemented a policy requiring ORCID accounts linked in the OSF profile of submitting authors, and narrowing our focus to social science subjects (see this announcement). Today we are taking two more steps:

1. We are pausing new submissions about AI topics for 90 days. That is, papers about AI models, testing AI models, proposing AI models, theories about the future of AI and so on. We will make exceptions for papers that are already accepted for publication (or already published) in peer-reviewed scholarly journals. And we will make exceptions for empirical social science research about AI in society – for example, a study on how AI use affects workers in an organization – on a case-by-case basis. The purpose of this pause is to make it faster and easier for moderators to reject these papers, and encourage these authors to find other ways of distributing their work.

If your empirical social science research paper on an AI topic is rejected and you would like to appeal, please email us a link to the paper at socarxiv@gmail.com with a short note of explanation. We apologize for requiring this step.

2. We are developing a policy for AI-related work. We need a better, formal policy on AI-generated and LLM-assisted content. We have formed a committee of volunteers from our social science and library science networks to gather existing policies from other services and publications, and decide what policy is right for us. This includes the values we want to support, the work we are able to do, and the technical needs and requirements we have in doing our moderation and hosting. We hope this policy will be ready to implement when the 90-day pause on AI-related papers ends.

If you have expertise or suggestions for us in this work, we would appreciate hearing from you.

SocArXiv submission rule changes

Context

SocArXiv is experiencing record high submission rates. In addition, now that we have paper versioning – which is great – our moderators have to approve every paper revision. As a result, our volunteer workload is increasing.

In addition we are receiving many non-research, spam, and AI-generated submissions. We do not have a technological way of identifying these, and it is time-consuming to read and assess them according to our moderation rules.

We also don’t have moderation workflow tools that allow us to, for example, sort incoming papers by subject, to get them to specific expert moderators. So all our moderators look at all papers as they come in. That encourages us to think about narrowing the range of subjects we accept.

The two rule changes below are intended to help manage the increased moderator burden. More policy changes may follow if the volume keeps increasing.

1. ORCID requirement

We require the submitting author to have a publicly accessible ORCID linked from the OSF profile page, with a name that matches that on the paper and the OSF account.

In the case of non-bibliographic submittors (e.g., a research assistant submitting for a supervisor), the first author must have an ORCID. We can make exceptions for institutional submitters upon request, such as journals that upload their papers for authors.

At present we are not requiring additional verification or specific trust markers on the ORCID (such as email or employer verification), just the existence of an account that lists the author’s name. It’s not a foolproof identity verification, obviously, but it adds a step for scammers, and also helps identify pseudonymous authors, which we do not permit. We may take advantage of ORCID’s trust markers program in the future and require additional elements on the ORCID record.

We are happy to host papers by independent scholars, but a disproportionate share of non-research, spam, and AI-generated submissions come from independent scholars, many of whom do not have ORCIDs. For those scholars with institutional affiliations, we urge you to get an ORCID. This is a good practice that we should all endorse.

2. Focus on social sciences

At its founding, SocArXiv did not want to maintain disciplinary boundaries. It was our intention to be the big paper server for all of social sciences, and we couldn’t draw an easy line between social sciences and some humanities subjects, especially history, philosophy, religious studies, and some area studies, which are humanities in the taxonomy we use, but have significant overlap with social sciences. It was more logical just to accept them all.

As the volume has increased, this has become less practical. In addition, a lot of junk and AI submissions are in the areas of religion, philosophy, and various language studies. We also don’t have moderators working in arts and humanities, and our moderators trained in social sciences are not expert at reviewing these papers. Finally, there is an excellent, open humanities archive: Knowledge Commons (KC Works), which is freely available for humanities scholars. With approval from that service, we will now direct authors to their site for papers we are rejecting in arts and humanities subjects.

We continue to accept papers in education and law, which are also generally adjacent to social science.

For a limited time we will accept revisions of papers we already host in arts and humanities, but urge those authors to include links to Knowledge Commons or somewhere else that can host their work in the future.

We will assess papers that include arts/humanities as well as social science subject identifiers, and if we determine they are principally in art/humanities, reject them.

We will continue to host all work we have already accepted.

SocArXiv joins preprint services in endorsing OSTP memo

The SocArXiv steering committee joins the preprint services arXiv and ioRxiv/medRxiv in their recent statements in support of the U.S. Office of Science and Technology Policy (OSTP) memo that directs the federal government to make outputs from government-funded research publicly accessible without charge or embargo. We endorse these statements, and reproduce them below.

arXiv OSTP memorandum response

April 11, 2023

The recent Office of Science and Technology Policy “Nelson Memorandum” on “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research”¹ is a welcome affirmation of the public right to access government funded research results, including publication of articles describing the research, and the data behind the research. The policy is likely to increase access to new and ongoing research, enable equitable access to the outcome of publicly funded research efforts, and enable and accelerate more research. Improved immediate access to research results may provide significant general social and economic benefits to the public.

Funding Agencies can expedite public access to research results through the distribution of electronic preprints of results in open repositories, in particular existing preprint distribution servers such as arXiv,² bioRxiv,³ and medRxiv.⁴ Distribution of preprints of research results enables rapid and free accessibility of the findings worldwide, circumventing publication delays of months, or, in some cases, years. Rapid circulation of research results expedites scientific discourse, shortens the cycle of discovery and accelerates the pace of discovery.⁵

Distribution of research findings by preprints, combined with curation of the archive of submissions, provides universal access for both authors and readers in perpetuity. Authors can provide updated versions of the research, including “as accepted,” with the repositories openly tracking the progress of the revision of results through the scientific process. Public access to the corpus of machine readable research manuscripts provides innovative channels for discovery and additional knowledge generation, including links to the data behind the research, open software tools, and supplemental information provided by authors.

Preprint repositories support a growing and innovative ecosystem for discovery and evaluation of research results, including tools for improved accessibility and research summaries. Experiments in open review and crowdsourced commenting can be layered over preprint repositories, providing constructive feedback and alternative models to the increasingly archaic process of anonymous peer review.

Distribution of research results by preprints provides a well tested path for immediate, free, and equitable access to research results. Preprint archives can support and sustain an open and innovative ecosystem of tools for research discovery and verification, providing a long term and sustainable approach for open access to publicly funded research.

¹White House OSTP Public Access Memo

²arXiv website

³bioRxiv website

⁴medRxiv website

⁵NIH Preprint Pilot, “The Pace of Artificial Intelligence Innovations: Speed, Talent, and Trial-and-Error”

bioRxiv and medRxiv response to the OSTP memo – an open letter to US funding agencies

2023-04-11

The preprint servers bioRxiv and medRxiv welcome the recent Office of Science and Technology Policy (OSTP) memo advising US government agencies to make publications and data from research funded by US taxpayers publicly accessible immediately, without embargo or cost. This new policy will stimulate research, increase equitability, and generate health, environmental and social benefits not only in the US but all around the world.

Agencies can enable free public access to research results simply by mandating that reports of federally funded research are made available as “preprints” on servers such as arXiv, bioRxiv, medRxiv, and chemRxiv, before being submitted for journal publication. This will ensure that the findings are freely accessible to anyone anywhere in the world. An important additional benefit is the immediate availability of the information, avoiding the long delays associated with evaluation by traditional scientific journals (typically around one year). Scientific inquiry then progresses faster, as has been particularly evident for COVID research during the pandemic.

Prior access mandates in the US and elsewhere have focused on articles published by academic journals. This complicated the issue by making it a question of how to adapt journal revenue streams and led to the emergence of new models based on article-processing charges (APCs). But APCs simply move the access barrier to authors: they are a significant financial obstacle for researchers in fields and communities that lack the funding to pay them. A preprint mandate would achieve universal access for both authors and readers upstream, ensuring the focus remains on providing access to research findings, rather than on how they are selected and filtered.

Mandating public access to preprints rather than articles in academic journals would also future-proof agencies’ access policies. The distinction between peer-reviewed and non-peer-reviewed material is blurring as new approaches make peer review an ongoing process rather than a judgment made at a single point in time. Peer review can be conducted independently of journals through initiatives like Review Commons. And traditional journal-based peer review is changing: for example, eLife, supported by several large funders, peer reviews submitted papers but no longer distinguishes accepted from rejected articles. The author’s “accepted” manuscript that is the focus of so-called Green Open Access policies may therefore no longer exist. Because of such ongoing change, mandating the free availability of preprints would be a straightforward and strategically astute policy for US funding agencies.

A preprint mandate would underscore the fundamental, often overlooked, point that it is the results of research to which the public should have access. The evaluation of that research by journals is part of an ongoing process of assessment that can take place after the results have been made openly available. Preprint mandates from the funders of research would also widen the possibilities for evolution within the system and avoid channeling it towards expensive APC-based publishing models. Furthermore, since articles on preprint servers can be accompanied by supplementary data deposits on the servers themselves or linked to data deposited elsewhere, preprint mandates would also provide mechanisms to accomplish the other important OSTP goal: availability of research data.

Richard Sever and John Inglis
Co-Founders, bioRxiv and medRxiv
Cold Spring Harbor Laboratory, New York, NY11724

Harlan Krumholz and Joseph Ross
Co-founders, medRxiv
Yale University, New Haven, CT06520

On withdrawing “Ivermectin and the odds of hospitalization due to COVID-19,” by Merino et al

On withdrawing “Ivermectin and the odds of hospitalization due to COVID-19,” by Merino et al.

4 February 2022

Preamble by Philip N. Cohen, director of SocArXiv

SocArXiv’s steering committee has decided to withdraw the paper, “Ivermectin and the odds of hospitalization due to COVID-19: evidence from a quasi-experimental analysis based on a public intervention in Mexico City,” by Jose Merino, Victor Hugo Borja, Oliva Lopez, José Alfredo Ochoa, Eduardo Clark, Lila Petersen, and Saul Caballero. [10.31235/osf.io/r93g4]

The paper is a report on a program in Mexico City that gave people medical kits when they tested positive for COVID-19, containing, among other things, ivermectin tablets. The conclusion of the paper is, “The study supports ivermectin-based interventions to assuage the effects of the COVID-19 pandemic on the health system.”

The lead author of the paper, José Merino, head of the Digital Agency for Public Innovation (DAPI), a government agency in Mexico City, tweeted about the paper: “Es una GRAN noticia poder validar una política pública que permitió reducir impactos en salud por covid19” (translation: “It is GREAT news to be able to validate a public policy that allowed reducing health impacts from covid19”). The other authors are officials at the Mexican Social Security Institute and the Mexico City Ministry of Health, and employees at the DAPI.

We have written about this paper previously. We wrote, in part:

“Depending on which critique you prefer, the paper is either very poor quality or else deliberately false and misleading. PolitiFact debunked it here, partly based on this factcheck in Portuguese. We do not believe it provides reliable or useful information, and we are disappointed that it has been very popular (downloaded almost 10,000 times so far). … We do not have a policy to remove papers like this from our service, which meet submission criteria when we post them but turn out to be harmful. However, we could develop one, such as a petition process or some other review trigger. This is an open discussion.”

The paper has now been downloaded more than 11,000 times, among our most-read papers of the past year. Since we posted that statement, the paper has received more attention. In particular, an article in Animal Politico in Mexico reported that the government of Mexico City has spent hundreds of thousands of dollars on ivermectin, which it still distributes (as of January 2022) to people who test positive for COVID-19. In response, University of California-San Diego sociology professor Juan Pablo Pardo-Guerra posted an appeal to SocArXiv asking us to remove the “deeply problematic and unethical” paper and ban its authors from our platform. The appeal, in a widely shared Twitter thread, argued that the authors, through their agency dispensing the medication, unethically recruited experimental subjects, apparently without informed consent, and thus the study is an unethical study; they did not declare a conflict of interest, although they are employees of agencies that carried out the policy. The thread was shared or liked by thousands of people. The article and response to the article prompted us to revisit this paper. On February 1, I promised to bring the issue to our Steering Committee for further discussion.

I am not a medical researcher, although I am a social scientist reasonably well-versed in public health research. I won’t provide a scholarly review of research on ivermectin. However, it is clear from the record of authoritative statements by global and national public health agencies that, at present, ivermectin should not be used as a treatment or preventative for COVID-19 outside of carefully controlled clinical studies, which this clearly was not. These are some of those statements, reflecting current guidance as of 3 February 2022.

World Health Organization: “We recommend not to use ivermectin, except in the context of a clinical trial.”
US Centers for Disease Control and Prevention: “ivermectin has not been proven as a way to prevent or treat COVID-19.”
US National Institutes of Health: “There is insufficient evidence for the COVID-19 Treatment Guidelines Panel (the Panel) to recommend either for or against the use of ivermectin for the treatment of COVID-19.”
European Medicines Agency: “use of ivermectin for prevention or treatment of COVID-19 cannot currently be recommended outside controlled clinical trials.”
US Food and Drug Administration: “The FDA has not authorized or approved ivermectin for use in preventing or treating COVID-19 in humans or animals. … Currently available data do not show ivermectin is effective against COVID-19.”

For reference, the scientific flaws in the paper are enumerated at the links above from PolitiFact, partly based on this factcheck from Estado in Portuguese, which included expert consultation. I also found this thread from Omar Yaxmehen Bello-Chavolla useful.

In light of this review, a program to publicly distribute ivermectin to people infected with COVID-19, outside of a controlled study, seems unethical. The paper is part of such a program, and currently serves as part of its justification.

To summarize, there remains insufficient evidence that ivermectin is effective in treating COVID-19; the study is of minimal scientific value at best; the paper is part of an unethical program by the government of Mexico City to dispense hundreds of thousands of doses of an inappropriate medication to people who were sick with COVID-19, which possibly continues to the present; the authors of the paper have promoted it as evidence that their medical intervention is effective. This review is intended to help the SocArXiv Steering Committee reach a decision on the request to remove the paper (we set aside the question of banning the authors from future submissions, which is reserved for people who repeatedly violate our rules). The statement below followed from this review.

SocArXiv Steering Committee statement on withdrawing the paper by Merino et al. (10.31235/osf.io/r93g4).

This is the first time we have used our prerogative as service administrators to withdraw a paper from SocArXiv. Although we reject many papers, according to our moderation policy, we don’t have a policy for unilaterally withdrawing papers after they have been posted. We don’t want to make policy around a single case, but we do want to respond to this situation.

We are withdrawing the paper, and replacing it with a “tombstone” that includes the paper’s metadata. We are doing this to prevent the paper from causing additional harm, and taking this incident as an impetus to develop a more comprehensive policy for future situations. The metadata will serve as a reference for people who follow citations to the paper to our site.

Our grounds for this decision are several:

The paper is spreading misinformation, promoting an unproved medical treatment in the midst of a global pandemic.
The paper is part of, and justification for, a government program that unethically dispenses (or did dispense) unproven medication apparently without proper consent or appropriate ethical protections according to the standards of human subjects research.
The paper is medical research – purporting to study the effects of a medication on a disease outcome – and is not properly within the subject scope of SocArXiv.
The authors did not properly disclose their conflicts of interest.

We appreciate that of the thousands of papers we have accepted and now host on our platform, there may be others that have serious flaws as well.

We are taking this unprecedented action because this particular bad paper appears to be more important, and therefore potentially more harmful, than other flawed work. In administering SocArXiv, we generally err on the side of inclusivity, and do not provide peer review or substantive vetting of the papers we host. Taking such an approach suits us philosophically, and also practically, since we don’t have staff to review every paper fully. But this approach comes with the responsibility to respond when something truly harmful gets through. In light of demonstrable harms like those associated with this paper, and in response to a community groundswell beseeching us to act, we are withdrawing this paper.

We reiterate that our moderation process does not involve peer review, or substantive evaluation, of the research papers that we host. Our moderation policy confirms only that papers are (1) scholarly, (2) in research areas that we support, (3) are plausibly categorized, (4) are correctly attributed, (5) are in languages that we moderate, and (6) are in text-searchable formats. Posting a paper on SocArXiv is not in itself an indication of good quality – but it is often a sign that researchers are acting in good faith and practicing open scholarship for the public good. We urge readers to consider this incident in the context of the greater good that open science and preprints in general, and our service in particular, do for researchers and the communities they serve.

We welcome comments and suggestions from readers, researchers, and the public. Feel free to email us at socarxiv@gmail.com, or contact us on our social media accounts at Twitter or Facebook.

Share this:

Share this:

Share this:

Share this:

Share this: