Summary: BOLD5000, a new, large scale data set of brain scans of people viewing images, is helping researchers to better understand how the brain processes images. The data set is a big step towards using computer visual models to study biological vision.
Source: Carnegie Mellon University
Neuroscientists and computer vision scientists say a new dataset of unprecedented size — comprising brain scans of four volunteers who each viewed 5,000 images — will help researchers better understand how the brain processes images.
Researchers at Carnegie Mellon University and Fordham University, reporting today in the journal Scientific Data, said acquiring functional magnetic resonance imaging (fMRI) scans at this scale presented unique challenges.
Each volunteer participated in 20 or more hours of MRI scanning, challenging both their perseverance and the experimenters’ ability to coordinate across scanning sessions. The extreme design decision to run the same individuals over so many sessions was necessary for disentangling the neural responses associated with individual images.
The resulting dataset, dubbed BOLD5000, allows cognitive neuroscientists to better leverage the deep learning models that have dramatically improved artificial vision systems. Originally inspired by the architecture of the human visual system, deep learning may be further improved by pursuing new insights into how human vision works and by having studies of human vision better reflect modern computer vision methods. To that end, BOLD5000 measured neural activity arising from viewing images taken from two popular computer vision datasets: ImageNet and COCO.
“The intertwining of brain science and computer science means that scientific discoveries can flow in both directions,” said co-author Michael J. Tarr, the Kavči?-Moura Professor of Cognitive and Brain Science and head of CMU’s Department of Psychology. “Future studies of vision that employ the BOLD5000 dataset should help neuroscientists better understand the organization of knowledge in the human brain. As we learn more about the neural basis of visual recognition, we will also be better positioned to contribute to advances in artificial vision.”
Lead author Nadine Chang, a Ph.D. student in CMU’s Robotics Institute who specializes in computer vision, suggested that computer vision scientists are looking to neuroscience to help innovate in the rapidly advancing area of artificial vision — reinforcing the two-way nature of this research.
“Computer-vision scientists and visual neuroscientists essentially have the same end goal: to understand how to process and interpret visual information,” Chang said.
Improving computer vision was an important part of the BOLD5000 project from its onset. Senior author Elissa Aminoff, then a post-doctoral fellow in CMU’s Psychology Department and now an assistant professor of psychology at Fordham, initiated this research direction with co-author Abhinav Gupta, an associate professor in the Robotics Institute.
Among the challenges faced in connecting biological and computer vision is that the majority of human neuroimaging studies include very few stimulus images — often 100 or less — which typically are simplified to depict only single objects against a neutral background. In contrast, BOLD5000 includes more than 5,000 real-world, complex images of scenes, single objects and interacting objects.
The group views BOLD5000 as only the first step toward leveraging modern computer vision models to study biological vision.
“Frankly, the BOLD5000 dataset is still way too small,” Tarr said, suggesting that a reasonable fMRI dataset would require at least 50,000 stimulus images and many more volunteers to make headway in light of the fact that the class of deep neural nets used to analyze visual imagery are trained on millions of images. To this end, the research team hopes their ability to generate a dataset of 5,000 brain scans will pave the way for larger collaborative efforts between human vision and computer vision scientists.
So far, the field’s response has been positive. The publicly available BOLD5000 dataset has already been downloaded more than 2,500 times.
In addition to Chang, Tarr, Gupta, and Aminoff, the research team included John A. Pyles, senior research scientist and scientific operations director of the CMU-Pitt BRIDGE Center, and Austin Marcus, a research assistant in Tarr’s lab.
Funding: The National Science Foundation, U.S. Office of Naval Research, the Alfred P. Sloan Foundation and the Okawa Foundation for Information and Telecommunications sponsored this research.
About this neuroscience research article
Source: Carnegie Mellon University Media Contacts: Byron Spice – Carnegie Mellon University Image Source: The image is in the public domain.
BOLD5000, a public fMRI dataset while viewing 5000 visual images
Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that include neuroscience, the number of images used in neuroimaging must be significantly increased. We present BOLD5000, a human functional MRI (fMRI) study that includes almost 5,000 distinct images depicting real-world scenes. Beyond dramatically increasing image dataset size relative to prior fMRI studies, BOLD5000 also accounts for image diversity, overlapping with standard computer vision datasets by incorporating images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets. The scale and diversity of these image datasets, combined with a slow event-related fMRI design, enables fine-grained exploration into the neural representation of a wide range of visual features, categories, and semantics. Concurrently, BOLD5000 brings us closer to realizing Marr’s dream of a singular vision science–the intertwined study of biological and computer vision.