Rethinking Personalized Medicine: AI's Limits in Clinical Trials

Summary: A new study reveals limitations in the current use of mathematical models for personalized medicine, particularly in schizophrenia treatment. Although these models can predict patient outcomes in specific clinical trials, they fail when applied to different trials, challenging the reliability of AI-driven algorithms in diverse settings.

This study underscores the need for algorithms to demonstrate effectiveness in multiple contexts before they can be truly trusted. The findings highlight a significant gap between the potential of personalized medicine and its current practical application, especially given the variability in clinical trials and real-world medical settings.

Key Facts:

Mathematical models currently used for personalized medicine are effective within specific clinical trials but fail to generalize across different trials.
The study raises concerns about the application of AI and machine learning in personalized medicine, especially for conditions like schizophrenia where treatment response varies greatly among individuals.
The research suggests that more comprehensive data sharing and inclusion of additional environmental variables could improve the reliability and accuracy of AI algorithms in medical treatments.

Source: Yale

The quest for personalized medicine, a medical approach in which practitioners use a patient’s unique genetic profile to tailor individual treatment, has emerged as a critical goal in the health care sector. But a new Yale-led study shows that the mathematical models currently available to predict treatments have limited effectiveness.

In an analysis of clinical trials for multiple schizophrenia treatments, the researchers found that the mathematical algorithms were able to predict patient outcomes within the specific trials for which they were developed, but failed to work for patients participating in different trials.

The findings are published Jan. 11 in the journal Science.

“This study really challenges the status quo of algorithm development and raises the bar for the future,” said Adam Chekroud, an adjunct assistant professor of psychiatry at Yale School of Medicine and corresponding author of the paper. “Right now, I would say we need to see algorithms working in at least two different settings before we can really get excited about it.”

“I’m still optimistic,” he added, “but as medical researchers we have some serious things to figure out.”

Chekroud is also president and co-founder of Spring Health, a private company that provides mental health services.

Schizophrenia, a complex brain disorder that affects about 1% of the U.S. population, perfectly illustrates the need for more personalized treatments, the researchers say. As many as 50% of patients diagnosed with schizophrenia fail to respond to the first antipsychotic drug that is prescribed, but it is impossible to predict which patients will respond to therapies and which will not.

Researchers hope that new technologies using machine learning and artificial intelligence might yield algorithms that better predict which treatments will work for different patients, and help improve outcomes and reduce costs of care.

Due to the high cost of running a clinical trial, however, most algorithms are only developed and tested using a single clinical trial. But researchers had hoped that these algorithms would work if tested on patients with similar profiles and receiving similar treatments.

For the new study, Chekroud and his Yale colleagues wanted to see if this hope was really true. To do so, they aggregated data from five clinical trials of schizophrenia treatments made available through the Yale Open Data Access (YODA) Project, which advocates for and supports responsible sharing of clinical research data.

In most cases, they found, the algorithms effectively predicted patient outcomes for the clinical trial in which they were developed. However, they failed to effectively predict outcomes for schizophrenia patients being treated in different clinical trials.

“The algorithms almost always worked first time around,” Chekroud said. “But when we tested them on patients from other trials the predictive value was no greater than chance.”

The problem, according to Chekroud, is that most of the mathematical algorithms used by medical researchers were designed to be used on much bigger data sets. Clinical trials are expensive and time consuming to conduct, so the studies typically enroll fewer than 1,000 patients.

Applying the powerful AI tools to analysis of these smaller data sets, he said, can often result in “over-fitting,” in which a model has learned response patterns that are idiosyncratic, or specific just to that initial trial data, but disappear when additional new data are included.

“The reality is, we need to be thinking about developing algorithms in the same way we think about developing new drugs,” he said. “We need to see algorithms working in multiple different times or contexts before we can really believe them.”

In the future, the inclusion of other environmental variables may or may not improve the success of algorithms in the analysis of clinical trial data, researchers added. For instance, does the patient abuse drugs or have personal support from family or friends? These are the kinds of factors that can affect outcomes of treatment.

Most clinical trials use precise criteria to improve chances for success, such as guidelines for which patients should be included (or excluded), careful measurement of outcomes, and limits on the number of doctors administering treatments. Real world settings, meanwhile, have a much wider variety of patients and greater variation in the quality and consistency of treatment, the researchers say.

“In theory, clinical trials should be the easiest place for algorithms to work. But if algorithms can’t generalize from one clinical trial to another, it will be even more challenging to use them in clinical practice,’’ said co-author John Krystal, the Robert L. McNeil, Jr. Professor of Translational Research and professor of psychiatry, neuroscience, and psychology at Yale School of Medicine. Krystal is also chair of Yale’s Department of Psychiatry.

Chekroud suggests that increased efforts to share data among researchers and the banking of additional data by large-scale health care providers might help increase the reliability and accuracy of AI-driven algorithms.

“Although the study dealt with schizophrenia trials, it raises difficult questions for personalized medicine more broadly, and its application in cardiovascular disease and cancer,” said Philip Corlett, an associate professor of psychiatry at Yale and co-author of the study.

Other Yale authors of the study are Hieronimus Loho; Ralitza Gueorguieva, a senior research scientist at Yale School of Public Health; and Harlan M. Krumholz, the Harold H. Hines Jr. Professor of Medicine (Cardiology) at Yale.

About this AI and personalized medicine research news

Author: Bess Connolly
Source: Yale
Contact: Bess Connolly – Yale
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Illusory generalizability of clinical prediction models” by Adam Chekroud et al. Science

Abstract

Illusory generalizability of clinical prediction models

It is widely hoped that statistical models can improve decision-making related to medical treatments. Because of the cost and scarcity of medical outcomes data, this hope is typically based on investigators observing a model’s success in one or two datasets or clinical contexts.

We scrutinized this optimism by examining how well a machine learning model performed across several independent clinical trials of antipsychotic medication for schizophrenia.

Models predicted patient outcomes with high accuracy within the trial in which the model was developed but performed no better than chance when applied out-of-sample. Pooling data across trials to predict outcomes in the trial left out did not improve predictions.

These results suggest that models predicting treatment outcomes in schizophrenia are highly context-dependent and may have limited generalizability.