Summary: While neuroimaging may be a standard in neuroscience and psychology research, a new study says researchers are massively underestimating how large the study sample must be for a neuroimaging study to produce reliable findings.
Source: University of Pittsburgh
What does it take to know a person?
If you’ve seen how a friend acts across different domains of their life, you might reasonably say you know who they are. Compare that to watching an interview with a celebrity — maybe you can claim some knowledge about them, but a single observation of a stranger can only tell you so much.
Yet a similar idea — that a lone snapshot of a brain can tell you about an individual’s personality or mental health — has been the basis of decades of neuroscience studies.
That approach was punctured by a paper in Nature earlier this year showing that scientists have massively underestimated how large such studies must be to produce reliable findings.
“The more we learn about who we are as people, the more we learn that, on average, we’re much more similar than we are different — and so understanding those differences is really challenging,” said Brenden Tervo-Clemmens (A&S ’21G), now a postdoctoral fellow at Massachusetts General Hospital and Harvard Medical School who co-led the multi-institutional research as a clinical psychology PhD student at Pitt.
At the center of the research is MRI (magnetic resonance imaging) brain scans. While invaluable for diagnosing brain conditions, they’ve also been used by researchers to draw links between a person’s brain structure and aspects of their personality and mental health.
Tervo-Clemmens and his colleagues call this technique brain-wide association scans, or BWAS, in a nod to “GWAS” studies that attempt to decipher the often-tiny effects of genes from massive datasets (as seen in dubious science headlines announcing “a gene for depression” or “a gene for intelligence”).
“The approach is similar: Here’s one profile of you biologically, how well can we determine the complexity of your human experience?” Tervo-Clemmens said. “And the answer is, usually not very well.”
A typical study of the kind would include somewhere around 25 participants, due in part to the high cost of running scans. But, Tervo-Clemmens and his colleagues showed, scientists would need to scan the brains of more than 1,000 to be confident that the connections they find aren’t just a statistical mirage.
Reaching that conclusion required getting a far broader view of the field than was possible until recently. Along with colleagues at a number of institutions as well as his advisor, Pitt Professor of Psychiatry Beatriz Luna, Tervo-Clemmens combined three recent publicly available studies that together included MRI data from around 50,000 participants.
Using this massive body of information, the team simulated the process of science, selecting groups of the scans at random as if they were patients recruited to a study. By repeating that process over and over, the researchers could figure out how likely it is that any given number of scans would produce a misleading result simply due to chance — and how many participants it takes for a study to be reliable.
Not every investigation requires 1,000 brain scans, they showed. “If the goal is just to understand something like the general organization of the brain, we sometimes only need 10 to 20 participants to do that,” Tervo-Clemmens said. It’s only because a single brain scan reveals so little about a person’s personality and mental health that researchers need a massive amount of data before these complex traits begin to reliably stand out from the statistical noise.
Amplifying that problem is a well-known bug in 21st century science: Researchers are often rewarded for publishing results that show exciting new connections, rather than less glamorous findings suggesting the absence of a connection.
The latter results are less likely to be published and more likely to languish on a hard drive. So not only are small imaging studies more likely to “discover” a link that isn’t actually there, but those same misleading studies also receive a disproportionate amount of attention.
Tervo-Clemmens is quick to note that the Nature paper wasn’t intended to call out the whole field. Neuroscientists and psychologists have successfully tackled questions about personality and mental health using a variety of other techniques. And brain scans on their own are very effective for diagnosing conditions and mapping out the broader picture of how brains work. It’s when scientists combine the two, reducing the complexities of a person into a single image, that they fall short.
“We can count on less than a hand the number of these studies that have held up under scrutiny and are really driving treatment,” he said. “In my own area, one study might show that increased function of a particular brain region is related to more symptoms, but you can find, almost without question, another study showing the opposite effect.”
Although he now focuses mostly on psychiatric and substance-use disorders in adolescents, Tervo-Clemmens doesn’t quite fit into any one box as a researcher. “I’m kind of a psychologist, and I’m kind of a statistician, and I’m kind of a neuroscientist,” he said. It’s that perspective, he explains, that helps him do the kind of broad critical research like this current study, along with his boundary-crossing education at Pitt.
He saw patients as a PhD student in clinical psychology while also training in cross-disciplinary programs like Center for the Neural Basis of Cognition, experiences he credits as encouraging breadth in research. “I think that level of integration is what makes Pitt so awesome as a graduate student,” he said.
The result was a study that’s already produced a stir among other scientists. An instant classic, the paper and its pre-publication version have already been cited by more than 250 other scholarly works.
So where does that leave the field?
First, Tervo-Clemmens said, it’s necessary to re-examine the smaller studies of the past to see if their results hold up to further examination. As for future research, one solution would be to simply supersize brain-scan studies of complex behavior so they stand up to statistical scrutiny. But there’s another possible way forward, where researchers find ways to study patients over time and across contexts to get a fuller sense of their identities.
“We need to be aligning our research to how we generally think and understand human beings,” said Tervo-Clemmens. “That’s a challenge of cost and economy. But I also think it’s one that will ultimately be worth it.”
It’s like growing pains for a line of research that’s only a few decades old: Stressful and full of uncertainty, but also a sign that the field is heading in new and exciting directions.
About this neuroimaging and neuroscience research news
Reproducible brain-wide association studies require thousands of individuals
Magnetic resonance imaging (MRI) has transformed our understanding of the human brain through well-replicated mapping of abilities to specific structures (for example, lesion studies) and functions (for example, task functional MRI (fMRI)). Mental health research and care have yet to realize similar advances from MRI.
A primary challenge has been replicating associations between inter-individual differences in brain structure or function and complex cognitive or mental health phenotypes (brain-wide association studies (BWAS)). Such BWAS have typically relied on sample sizes appropriate for classical brain mapping (the median neuroimaging study sample size is about 25), but potentially too small for capturing reproducible brain–behavioural phenotype associations.
Here we used three of the largest neuroimaging datasets currently available—with a total sample size of around 50,000 individuals—to quantify BWAS effect sizes and reproducibility as a function of sample size. BWAS associations were smaller than previously thought, resulting in statistically underpowered studies, inflated effect sizes and replication failures at typical sample sizes.
As sample sizes grew into the thousands, replication rates began to improve and effect size inflation decreased. More robust BWAS effects were detected for functional MRI (versus structural), cognitive tests (versus mental health questionnaires) and multivariate methods (versus univariate). Smaller than expected brain–phenotype associations and variability across population subsamples can explain widespread BWAS replication failures.
In contrast to non-BWAS approaches with larger effects (for example, lesions, interventions and within-person), BWAS reproducibility requires samples with thousands of individuals.