ChatGPT Matches Radiologists in Brain Tumor Diagnosis Accuracy

Summary: Researchers compared the diagnostic accuracy of GPT-4 based ChatGPT and radiologists using 150 brain tumor MRI reports. ChatGPT achieved 73% accuracy, slightly outperforming neuroradiologists (72%) and general radiologists (68%).

The AI model’s accuracy was highest (80%) when interpreting reports written by neuroradiologists, suggesting its potential in supporting medical diagnoses. This study highlights AI’s growing role in radiology and its future potential to reduce physician workload and improve diagnostic accuracy.

Key Facts:

ChatGPT’s diagnostic accuracy was 73%, slightly higher than radiologists.
Its accuracy was 80% when using neuroradiologist-written reports.
The study shows AI could assist in improving diagnostic efficiency in radiology.

Source: Osaka Metropolitan University

As artificial intelligence advances, its uses and capabilities in real-world applications continue to reach new heights that may even surpass human expertise.

In the field of radiology, where a correct diagnosis is crucial to ensure proper patient care, large language models, such as ChatGPT, could improve accuracy or at least offer a good second opinion.

This shows brain scans. — The results stood at 73% for ChatGPT, a 72% average for neuroradiologists, and 68% average for general radiologists. Credit: Neuroscience News

To test its potential, graduate student Yasuhito Mitsuyama and Associate Professor Daiju Ueda’s team at Osaka Metropolitan University’s Graduate School of Medicine led the researchers in comparing the diagnostic performance of GPT-4 based ChatGPT and radiologists on 150 preoperative brain tumor MRI reports.

Based on these daily clinical notes written in Japanese, ChatGPT, two board-certified neuroradiologists, and three general radiologists were asked to provide differential diagnoses and a final diagnosis.

Subsequently, their accuracy was calculated based on the actual diagnosis of the tumor after its removal.

The results stood at 73% for ChatGPT, a 72% average for neuroradiologists, and 68% average for general radiologists.

Additionally, ChatGPT’s final diagnosis accuracy varied depending on whether the clinical report was written by a neuroradiologist or a general radiologist.

The accuracy with neuroradiologist reports was 80%, compared to 60% when using general radiologist reports.

“These results suggest that ChatGPT can be useful for preoperative MRI diagnosis of brain tumors,” stated graduate student Mitsuyama.

“In the future, we intend to study large language models in other diagnostic imaging fields with the aims of reducing the burden on physicians, improving diagnostic accuracy, and using AI to support educational environments.”

About this AI and brain cancer research news

Author: Yung-Hsiang Kao
Source: Osaka Metropolitan University
Contact: Yung-Hsiang Kao – Osaka Metropolitan University
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors” by Yasuhito Mitsuyama et al. European Radiology

Abstract

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors

Objectives

Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this potential primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports of brain tumors and compare its performance with that of neuroradiologists and general radiologists.

Methods

We collected brain MRI reports written in Japanese from preoperative brain tumor patients at two institutions from January 2017 to December 2021. The MRI reports were translated into English by radiologists. GPT-4 and five radiologists were presented with the same textual findings from the reports and asked to suggest differential and final diagnoses. The pathological diagnosis of the excised tumor served as the ground truth. McNemar’s test and Fisher’s exact test were used for statistical analysis.

Results

In a study analyzing 150 radiological reports, GPT-4 achieved a final diagnostic accuracy of 73%, while radiologists’ accuracy ranged from 65 to 79%. GPT-4’s final diagnostic accuracy using reports from neuroradiologists was higher at 80%, compared to 60% using those from general radiologists. In the realm of differential diagnoses, GPT-4’s accuracy was 94%, while radiologists’ fell between 73 and 89%. Notably, for these differential diagnoses, GPT-4’s accuracy remained consistent whether reports were from neuroradiologists or general radiologists.

Conclusion

GPT-4 exhibited good diagnostic capability, comparable to neuroradiologists in differentiating brain tumors from MRI reports. GPT-4 can be a second opinion for neuroradiologists on final diagnoses and a guidance tool for general radiologists and residents.

Clinical relevance statement

This study evaluated GPT-4-based ChatGPT’s diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists.

ChatGPT Matches Radiologists in Brain Tumor Diagnosis Accuracy