AI vs. Human Writing: Experts Fooled Almost 62% of the Time

Summary: In a surprising twist, linguistics experts struggle to differentiate AI-generated content from human writing. When reviewing research abstracts, these experts could only identify AI-created content correctly 38.9% of the time. Even with their in-depth knowledge of language patterns, their reasons for classifications often missed the mark. The study raises questions about AI’s role in academia and the need for improved detection tools.

Key Facts:

Linguistics experts identified AI-generated content correctly only 38.9% of the time.
None of the 72 experts correctly identified all four writing samples given to them.
AI struggles with longer texts, making it easier to detect due to “hallucinated” content.

Source: University of South Florida

Even linguistics experts are largely unable to spot the difference between writing created by artificial intelligence or humans, according to a new study co-authored by a University of South Florida assistant professor.

Research just published in the ScienceDirect journal Research Methods in Applied Linguistics revealed that experts from the world’s top linguistic journals could differentiate between AI- and human-generated abstracts less than 39 percent of the time.

This shows a man on a laptop and a robot. — Based on this, Kessler and Casal concluded ChatGPT can write short genres just as well as most humans, if not better in some cases, given that AI typically does not make grammatical errors. Credit: Neuroscience News

“We thought if anybody is going to be able to identify human-produced writing, it should be people in linguistics who’ve spent their careers studying patterns in language and other aspects of human communication,” said Matthew Kessler, a scholar in the USF the Department of World Languages.

Working alongside J. Elliott Casal, assistant professor of applied linguistics at The University of Memphis, Kessler tasked 72 experts in linguistics with reviewing a variety of research abstracts to determine whether they were written by AI or humans.

Each expert was asked to examine four writing samples. None correctly identified all four, while 13 percent got them all wrong. Kessler concluded that, based on the findings, professors would be unable to distinguish between a student’s own writing or writing generated by an AI-powered language model such as ChatGPT without the help of software that hasn’t yet been developed.

Despite the experts’ attempts to use rationales to judge the writing samples in the study, such as identifying certain linguistic and stylistic features, they were largely unsuccessful with an overall positive identification rate of 38.9 percent.

“What was more interesting was when we asked them why they decided something was written by AI or a human,” Kessler said. “They shared very logical reasons, but again and again, they were not accurate or consistent.”

Based on this, Kessler and Casal concluded ChatGPT can write short genres just as well as most humans, if not better in some cases, given that AI typically does not make grammatical errors.

The silver lining for human authors lies in longer forms of writing. “For longer texts, AI has been known to hallucinate and make up content, making it easier to identify that it was generated by AI,” Kessler said.

Kessler hopes this study will lead to a bigger conversation to establish the necessary ethics and guidelines surrounding the use of AI in research and education.

About this artificial intelligence and ChatGPT research news

Author: John Dudley
Source: University of South Florida
Contact: John Dudley – University of South Florida
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Can linguists distinguish between ChatGPT/AI and human writing?: A study of research ethics and academic publishing” by Matthew Kessler et al. Research Methods in Applied Linguistics

Abstract

Can linguists distinguish between ChatGPT/AI and human writing?: A study of research ethics and academic publishing

There has been considerable intrigue surrounding the use of Large Language Model powered AI chatbots such as ChatGPT in research, educational contexts, and beyond.

However, most studies have explored such tools’ general capabilities and applications for language teaching purposes.

The current study advances this discussion to examine issues pertaining to human judgements, accuracy, and research ethics.

Specifically, we investigate: 1) the extent to which linguists/reviewers from top journals can distinguish AI- from human-generated writing, 2) what the basis of reviewers’ decisions are, and 3) the extent to which editors of top Applied Linguistics journals believe AI tools are ethical for research purposes.

In the study, reviewers (N = 72) completed a judgement task involving AI- and human-generated research abstracts, and several reviewers participated in follow-up interviews to explain their rationales. Similarly, editors (N = 27) completed a survey and interviews to discuss their beliefs.

Findings suggest that despite employing multiple rationales to judge texts, reviewers were largely unsuccessful in identifying AI versus human writing, with an overall positive identification rate of only 38.9%. Additionally, many editors believed there are ethical uses of AI tools for facilitating research processes, yet some disagreed.

Future research directions are discussed involving AI tools and academic publishing.