Building Better Brain Imaging Models for Broader Clinical Use

Summary: New research shows that predictive models linking brain activity and behavior need to generalize across diverse datasets to be useful in clinical settings. By training models on varied brain imaging datasets, researchers found that effective models can still perform accurately when tested on different datasets with unique demographic and regional characteristics.

This finding emphasizes the need to develop neuroimaging models that work for diverse populations, including underserved rural communities, to ensure fair access to future diagnostic and treatment tools.

The study suggests that testing models on diverse data is crucial for achieving robust predictive capabilities in neuroimaging applications. Expanding model generalization will help neuroimaging tools better support personalized mental health care.

Key Facts:

Models performed well across diverse brain imaging datasets, showing promise for generalizability.
Testing models on different datasets is essential for achieving clinical relevance.
Diverse representation in neuroimaging data could ensure equitable mental health care.

Source: Yale

Relating brain activity to behavior is an ongoing aim of neuroimaging research as it would help scientists understand how the brain begets behavior — and perhaps open new opportunities for personalized treatment of mental health and neurological conditions.

In some cases, scientists use brain images and behavioral data to train machine learning models to predict an individual’s symptoms or illness based on brain function. But these models are only useful if they can generalize across settings and populations.

In a new study, Yale researchers show that predictive models can work well on datasets quite different from the ones the model was trained on.

This shows a brain. — Three models were trained — one on each dataset — and then each model was tested on the other two datasets. Credit: Neuroscience News

In fact, they argue that testing models in this way, on diverse data, will be essential for developing clinically useful predictive models.

“It is common for predictive models to perform well when tested on data similar to what they were trained on,” said Brendan Adkinson, lead author of the study published recently in the journal Developmental Cognitive Neuroscience.

“But when you test them in a dataset with different characteristics, they often fail, which makes them virtually useless for most real-world applications.”

The issue lies in differences across datasets, which include variations in the age, sex, race and ethnicity, geography, and clinical symptom presentation among the individuals included in the datasets.

But rather than viewing these differences as a hurdle to model development, researchers should see them as a key component, says Adkinson.

“Predictive models will only be clinically valuable if they can predict effectively on top of these dataset-specific idiosyncrasies,” said Adkinson, who is an M.D.-Ph.D. candidate in the lab of senior author Dustin Scheinost, associate professor of radiology and biomedical imaging at Yale School of Medicine.

To test how well models can function across diverse datasets, the researchers trained models to predict two traits — language abilities and executive function — from three large datasets that were substantially different from each other.

Three models were trained — one on each dataset — and then each model was tested on the other two datasets.

“We found that even though these datasets were markedly different from each other, the models still performed well by neuroimaging standards during testing,” said Adkinson.

“That tells us that generalizable models are achievable and testing on diverse dataset features can help.”

Going forward, Adkinson is interested in exploring the idea of generalizability as it relates to a specific population.

The large-scale data collection efforts used for generating neuroimaging predictive models are based in metropolitan areas where researchers have access to more people.

But building models exclusively on data collected from people living in urban and suburban areas runs the risk of creating models that don’t generalize to people living in rural regions, the researchers say.

“If we get to a point where predictive models are robust enough to use in clinical assessment and treatment, but they don’t generalize to specific populations, like rural residents, then those populations won’t be served as well as others,” said Adkinson, who comes from a rural area himself.

“So we’re looking at how to generalize models to rural populations.”

About this AI and neuroimaging research news

Author: Mallory Locklear
Source: Yale
Contact: Mallory Locklear – Yale
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Brain-phenotype predictions of language and executive function can survive across diverse real-world data: Dataset shifts in developmental populations” by Brendan Adkinson et al. Developmental Cognitive Neuroscience

Abstract

Brain-phenotype predictions of language and executive function can survive across diverse real-world data: Dataset shifts in developmental populations

Predictive modeling potentially increases the reproducibility and generalizability of neuroimaging brain-phenotype associations. Yet, the evaluation of a model in another dataset is underutilized.

Among studies that undertake external validation, there is a notable lack of attention to generalization across dataset-specific idiosyncrasies (i.e., dataset shifts). Research settings, by design, remove the between-site variations that real-world and, eventually, clinical applications demand.

Here, we rigorously test the ability of a range of predictive models to generalize across three diverse, unharmonized developmental samples: the Philadelphia Neurodevelopmental Cohort (n=1291), the Healthy Brain Network (n=1110), and the Human Connectome Project in Development (n=428).

These datasets have high inter-dataset heterogeneity, encompassing substantial variations in age distribution, sex, racial and ethnic minority representation, recruitment geography, clinical symptom burdens, fMRI tasks, sequences, and behavioral measures.

Through advanced methodological approaches, we demonstrate that reproducible and generalizable brain-behavior associations can be realized across diverse dataset features. Results indicate the potential of functional connectome-based predictive models to be robust despite substantial inter-dataset variability.

Notably, for the HCPD and HBN datasets, the best predictions were not from training and testing in the same dataset (i.e., cross-validation) but across datasets. This result suggests that training on diverse data may improve prediction in specific cases.

Overall, this work provides a critical foundation for future work evaluating the generalizability of brain-phenotype associations in real-world scenarios and clinical settings.