Where should SocArXiv draw the AI line?

Ducks swimming right, Geese swimming left. (PNC photo)

Someone submitted a paper to SocArXiv that we would have accepted a few years ago.

By our moderation policy, we apply only a very minimal quality standard. In addition to required structural elements — like an abstract, cited references, a title that reflects the content, ORCID, etc. — we sometimes reject papers that don’t surpass “a minimal standard of informative value.” But this paper would have passed it. It was boring, unoriginal, and superficial. Its literature review was deficient. What it claimed as an original theoretical insight was not interesting. It had some complex statistical models, apparently done competently, and graphical as well as tabular results. The citations and in-text quotations appeared to be real. As a whole, it was coherent and relevant to existing research.

At the end of the paper the author included an “AI disclosure.” They listed several AI tools used to generate code, conduct the literature search, “consult” on statistics, and draft the text — they admitted that almost all of the text was generated by these tools. But the author claimed to have formulated the research question, divined the theoretical framework, chosen variables and model specification, “directed analytical decisions,” interpreted the results, and verified every data claim, as well as every citation and quotation. They also shared the statistical code in a public repository, and offered an AI methodology audit on request.

In other words, to reject this paper, we would have to do it based on the nature and extent of the AI tools. As we attempt to formulate a policy for this, I find this case interesting.

I have my own biases. If you told me your only use of AI was to generate your statistical code, I think I would accept your paper (especially if you shared the code). Likewise if you had used AI tools to conduct categorical coding of qualitative data, provided it was human directed and verified. Also, if you told me you only used AI tools to help with writing — fixing style and grammar, language translation, helping to come up with a title or abstract — I think I would accept the paper. And if you told me you used AI to help with your literature search, such as by conducting natural language queries, I think I would accept the paper. But all of these, and writing the first draft, too?

So this paper stands out for using AI tools to do all of this, plus drafting the original text.

One clear position is that using such tools at all is unethical. The models all use people’s work without attribution. I am not persuaded by this, because I think all knowledge is learned from someone else. We have norms for attribution which are partly about ethics, and partly about validity, but there is no standard that says everything you read must be cited. However, these norms are complicated and subject to adaptation, so I don’t rule out changing my mind.

On legality, I think AI training models in principle may be practicing fair use. But it would be a copyright violation if their outputs end up displacing income from the original producers, however — as seems to be the case for news organizations whose content is served to chatbot subscribers. Obviously, I’m not expert on the legal issues, but I’m also not in charge of enforcing copyright law.

Another argument is that platforms like SocArXiv need to defend the scholarly ecosystem from slipping into a self-referencing death spiral of AI slop research generated from AI slop ad infinitum. This might especially be the case for a platform like ours, which accepts work without peer review but assigns DOIs and other trappings of scholarly legitimacy. If you are building a training model to write social science papers, SocArXiv papers would seem to be an attractive (free) target for harvesting.

On the other hand, if we attempt to ban AI-generated research — or even work with limited AI-generated components — we will be entering an endless arms race that we will ultimately lose. In the process, we will spend all our money and time trying to defeat global monopoly powers instead of helping real researchers archive and disseminate their research, which is our mission. And — as I remind people as often as I can — no one should be looking at the corpus of SocArXiv work as a repository of the best research in any field. Most humans come to us with a specific link to a paper, or an author, and get what they need. There is a lot of bad work on our platform — which, unlike most journals and even some preprint servers, we are not shy about admitting — because it doesn’t hurt the good work that is here, and we’re not trying to make money at this. Unless we get so overwhelmed with slop that we can’t maintain the service, I think that if it’s easier to accept bad work than it is to reject it, accepting it might be the more practical course.

Even a requirement like author disclosure of AI tool use could be crippling, because we don’t have the resources to verify claims, or sleuth out people who make false claims or deny using chatbots when they actually do, and so on. ChatGPT et al. read the rules we write, and will happily help authors pretend to comply with them. Again, arms race.

We have been discussing this at SocArXiv, but have not finalized our policy. When we do, I will link it here. In the meantime, we welcome your feedback, ideas, and suggestions — in the comments, or in email to socarxiv@gmail.com, or any other (peaceful) way. (Human-generated, please.) I would especially appreciate discussion that recognizes there are good people with different perspectives and experience, and it would be great if we could find a way to work together.

–Philip N. Cohen

7 thoughts on “Where should SocArXiv draw the AI line?

  1. This piece really gets at the tension that many open-access platforms are navigating right now. The arms race analogy is spot on and I appreciate the honesty about practical constraints. I think the disclosure-based approach is a reasonable middle ground, even with its limitations. The fact that you’re thinking through this so openly and inviting community input is exactly what scholarly infrastructure should look like.

    Liked by 1 person

    1. I see no objection to using AI as a well-informed secretary, provided that new concepts are the intellectual property of the author, who is also responsible for checking the sources. AI is an enormously valuable tool when used responsibly.

      Like

  2. This is a really difficult policy decision for you. I understand that you want to be the place that everybody can use to place their works for visibility, but at the same time you do not want to get the reputation that this is where the bad papers go to die, and AI slop may encourage the latter. If you put some hurdles against (partially) AI-generated papers, people will be dishonest and circumvent them. There is no easy solution, and I am sure you are not the only one looking for one. Watch what publishers with more reputation to defend do.

    What about making sure that a human needs to complete your forms and uploads? Involve a reCATCHA and a code sent by email. That would at least prevent mass spamming by a bot.

    Liked by 1 person

    1. This is good advice, thanks. People submitting papers to SocArXiv need to have an account on OSF, which has human verification. We also require an ORCID linked to the profile of the submitting author (which we verify manually on each submission). So far we haven’t had a problem with mass automated submissions.

      Like

Leave a reply to chrisbigum Cancel reply