Can ChatGPT Rival University Students in Academic Performance?

Summary: ChatGPT can match or even surpass university students in answering assessment questions across various disciplines. This AI model particularly excelled in subjects like political studies and psychology.

However, when it came to fields like mathematics and economics, students maintained an edge. Despite ChatGPT’s capabilities, 70% of educators believe that its use in assignments constitutes plagiarism.

Key Facts:

ChatGPT secured similar or higher grades than students in 9 out of 32 courses.
74% of surveyed students from various countries admitted they’d use ChatGPT for their academic work.
AI detection tools, GPTZero and AI text classifier, identified ChatGPT’s generated answers as human-written 32% and 49% of the time, respectively.

Source: Scientific Reports

ChatGPT may match or even exceed the average grade of university students when answering assessment questions across a range of subjects including computer science, political studies, engineering, and psychology, reports a paper published in Scientific Reports.

The research also found that almost three-quarters of students surveyed would use ChatGPT to help with their assignments, despite many educators considering its use to be plagiarism.

To investigate how ChatGPT performed when writing university assessments compared to students, Talal Rahwan and Yasir Zaki invited faculty members who taught32 different courses at New York University Abu Dhabi (NYUAD) to provide three student submissions each for ten assessment questions that they had set.

ChatGPT was then asked to produce three sets of answers to the ten questions, which were then assessed alongside student-written answers by three graders (who were unaware of the source of the answers). The ChatGPT-generated answers achieved a similar or higher average grade than students in 9 of 32 courses.

Only mathematics and economics courses saw students consistently outperform ChatGPT. ChatGPT outperformed students most markedly in the ‘Introduction to Public Policy’ course, where its average grade was 9.56 compared to 4.39 for students.

The authors also surveyed views on whether ChatGPT could be used to assist with university assignments among 1,601 individuals from Brazil, India, Japan, the US, and the UK (including at least 200 students and 100 educators from each country).

74 percent of students indicated that they would use ChatGPT in their work. In contrast, in all countries, educators underestimated the proportion of students that plan to use ChatGPT and 70 percent of educators reported that they would treat its use as plagiarism.

Finally, the authors report that two tools for identifying AI-generated text — GPTZero and AI text classifier — misclassified the ChatGPT answers generated in this research as written by a human 32 percent and 49 percent of the time respectively.

Together, these findings offer insights that could inform policy for the use of AI tools within educational settings.

About this ChatGPPT and AI research news

Author: Alice Kay
Source: Scientific Reports
Contact: Alice Kay – Scientific Reports
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Perception, performance, and detectability of conversational artificial intelligence across 32 university courses” by Talal Rahwan et al. Scientific Reports

Abstract

Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work.

With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work—a possibility that has sparked ample discussion on the integrity of student evaluation processes in the age of artificial intelligence (AI).

To date, it is unclear how such tools perform compared to students on university-level courses across various disciplines.

Further, students’ perspectives regarding the use of such tools in school work, and educators’ perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of the state-of-the-art tool, ChatGPT, against that of students on 32 university-level courses.

We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a global survey across five countries, as well as a more in-depth survey at the authors’ institution, to discern students’ and educators’ perceptions of ChatGPT’s use in school work.

We find that ChatGPT’s performance is comparable, if not superior, to that of students in a multitude of courses.

Moreover, current AI-text classifiers cannot reliably detect ChatGPT’s use in school work, due to both their propensity to classify human-written answers as AI-generated, as well as the relative ease with which AI-generated text can be edited to evade detection.

Finally, there seems to be an emerging consensus among students to use the tool, and among educators to treat its use as plagiarism.

Our findings offer insights that could guide policy discussions addressing the integration of artificial intelligence into educational frameworks.