Where should SocArXiv draw the AI line?

Ducks swimming right, Geese swimming left. (PNC photo)

Someone submitted a paper to SocArXiv that we would have accepted a few years ago.

By our moderation policy, we apply only a very minimal quality standard. In addition to required structural elements — like an abstract, cited references, a title that reflects the content, ORCID, etc. — we sometimes reject papers that don’t surpass “a minimal standard of informative value.” But this paper would have passed it. It was boring, unoriginal, and superficial. Its literature review was deficient. What it claimed as an original theoretical insight was not interesting. It had some complex statistical models, apparently done competently, and graphical as well as tabular results. The citations and in-text quotations appeared to be real. As a whole, it was coherent and relevant to existing research.

At the end of the paper the author included an “AI disclosure.” They listed several AI tools used to generate code, conduct the literature search, “consult” on statistics, and draft the text — they admitted that almost all of the text was generated by these tools. But the author claimed to have formulated the research question, divined the theoretical framework, chosen variables and model specification, “directed analytical decisions,” interpreted the results, and verified every data claim, as well as every citation and quotation. They also shared the statistical code in a public repository, and offered an AI methodology audit on request.

In other words, to reject this paper, we would have to do it based on the nature and extent of the AI tools. As we attempt to formulate a policy for this, I find this case interesting.

I have my own biases. If you told me your only use of AI was to generate your statistical code, I think I would accept your paper (especially if you shared the code). Likewise if you had used AI tools to conduct categorical coding of qualitative data, provided it was human directed and verified. Also, if you told me you only used AI tools to help with writing — fixing style and grammar, language translation, helping to come up with a title or abstract — I think I would accept the paper. And if you told me you used AI to help with your literature search, such as by conducting natural language queries, I think I would accept the paper. But all of these, and writing the first draft, too?

So this paper stands out for using AI tools to do all of this, plus drafting the original text.

One clear position is that using such tools at all is unethical. The models all use people’s work without attribution. I am not persuaded by this, because I think all knowledge is learned from someone else. We have norms for attribution which are partly about ethics, and partly about validity, but there is no standard that says everything you read must be cited. However, these norms are complicated and subject to adaptation, so I don’t rule out changing my mind.

On legality, I think AI training models in principle may be practicing fair use. But it would be a copyright violation if their outputs end up displacing income from the original producers, however — as seems to be the case for news organizations whose content is served to chatbot subscribers. Obviously, I’m not expert on the legal issues, but I’m also not in charge of enforcing copyright law.

Another argument is that platforms like SocArXiv need to defend the scholarly ecosystem from slipping into a self-referencing death spiral of AI slop research generated from AI slop ad infinitum. This might especially be the case for a platform like ours, which accepts work without peer review but assigns DOIs and other trappings of scholarly legitimacy. If you are building a training model to write social science papers, SocArXiv papers would seem to be an attractive (free) target for harvesting.

On the other hand, if we attempt to ban AI-generated research — or even work with limited AI-generated components — we will be entering an endless arms race that we will ultimately lose. In the process, we will spend all our money and time trying to defeat global monopoly powers instead of helping real researchers archive and disseminate their research, which is our mission. And — as I remind people as often as I can — no one should be looking at the corpus of SocArXiv work as a repository of the best research in any field. Most humans come to us with a specific link to a paper, or an author, and get what they need. There is a lot of bad work on our platform — which, unlike most journals and even some preprint servers, we are not shy about admitting — because it doesn’t hurt the good work that is here, and we’re not trying to make money at this. Unless we get so overwhelmed with slop that we can’t maintain the service, I think that if it’s easier to accept bad work than it is to reject it, accepting it might be the more practical course.

Even a requirement like author disclosure of AI tool use could be crippling, because we don’t have the resources to verify claims, or sleuth out people who make false claims or deny using chatbots when they actually do, and so on. ChatGPT et al. read the rules we write, and will happily help authors pretend to comply with them. Again, arms race.

We have been discussing this at SocArXiv, but have not finalized our policy. When we do, I will link it here. In the meantime, we welcome your feedback, ideas, and suggestions — in the comments, or in email to socarxiv@gmail.com, or any other (peaceful) way. (Human-generated, please.) I would especially appreciate discussion that recognizes there are good people with different perspectives and experience, and it would be great if we could find a way to work together.

–Philip N. Cohen

SocArXiv moratorium on AI-topic papers, policy in formation

In light of record submission rates and a large volume of AI-generated slop papers, SocArXiv recently implemented a policy requiring ORCID accounts linked in the OSF profile of submitting authors, and narrowing our focus to social science subjects (see this announcement). Today we are taking two more steps:

1. We are pausing new submissions about AI topics for 90 days. That is, papers about AI models, testing AI models, proposing AI models, theories about the future of AI and so on. We will make exceptions for papers that are already accepted for publication (or already published) in peer-reviewed scholarly journals. And we will make exceptions for empirical social science research about AI in society – for example, a study on how AI use affects workers in an organization – on a case-by-case basis. The purpose of this pause is to make it faster and easier for moderators to reject these papers, and encourage these authors to find other ways of distributing their work.

If your empirical social science research paper on an AI topic is rejected and you would like to appeal, please email us a link to the paper at socarxiv@gmail.com with a short note of explanation. We apologize for requiring this step.

2. We are developing a policy for AI-related work. We need a better, formal policy on AI-generated and LLM-assisted content. We have formed a committee of volunteers from our social science and library science networks to gather existing policies from other services and publications, and decide what policy is right for us. This includes the values we want to support, the work we are able to do, and the technical needs and requirements we have in doing our moderation and hosting. We hope this policy will be ready to implement when the 90-day pause on AI-related papers ends.

If you have expertise or suggestions for us in this work, we would appreciate hearing from you.