IT Looks Familiar: How The Brain Recognizes Objects

Neuroscientists find evidence that the brain’s inferotemporal cortex can identify objects.

When the eyes are open, visual information flows from the retina through the optic nerve and into the brain, which assembles this raw information into objects and scenes.

Scientists have previously hypothesized that objects are distinguished in the inferior temporal (IT) cortex, which is near the end of this flow of information, also called the ventral stream. A new study from MIT neuroscientists offers evidence that this is indeed the case.

Using data from both humans and nonhuman primates, the researchers found that neuron firing patterns in the IT cortex correlate strongly with success in object-recognition tasks.

“While we knew from prior work that neuronal population activity in inferior temporal cortex was likely to underlie visual object recognition, we did not have a predictive map that could accurately link that neural activity to object perception and behavior. The results from this study demonstrate that a particular map from particular aspects of IT population activity to behavior is highly accurate over all types of objects that were tested,” says James DiCarlo, head of MIT’s Department of Brain and Cognitive Sciences, a member of the McGovern Institute for Brain Research, and senior author of the study, which appears in the Journal of Neuroscience.

The paper’s lead author is Najib Majaj, a former postdoc in DiCarlo’s lab who is now at New York University. Other authors are former MIT graduate student Ha Hong and former MIT undergraduate Ethan Solomon.

Distinguishing objects

Earlier stops along the ventral stream are believed to process basic visual elements such as brightness and orientation. More complex functions take place farther along the stream, with object recognition believed to occur in the IT cortex.

To investigate this theory, the researchers first asked human subjects to perform 64 object-recognition tasks. Some of these tasks were “trivially easy,” Majaj says, such as distinguishing an apple from a car. Others — such as discriminating between two very similar faces — were so difficult that the subjects were correct only about 50 percent of the time.

After measuring human performance on these tasks, the researchers then showed the same set of nearly 6,000 images to nonhuman primates as they recorded electrical activity in neurons of the inferior temporal cortex and another visual region known as V4.

Each of the 168 IT neurons and 128 V4 neurons fired in response to some objects but not others, creating a firing pattern that served as a distinctive signature for each object. By comparing these signatures, the researchers could analyze whether they correlated to humans’ ability to distinguish between two objects.

The researchers found that the firing patterns of IT neurons, but not V4 neurons, perfectly predicted the human performances they had seen. That is, when humans had trouble distinguishing two objects, the neural signatures for those objects were so similar as to be indistinguishable, and for pairs where humans succeeded, the patterns were very different.

“On the easy stimuli, IT did as well as humans, and on the difficult stimuli, IT also failed,” Majaj says. “We had a nice correlation between behavior and neural responses.”

The findings support the hypothesis that patterns of neural activity in the IT cortex can encode object representations detailed enough to allow the brain to distinguish different objects, the researchers say.

Outline of a human head. The brain is shown in pink and there are purple lines streaming from the eyes. — Researchers have found that neuron firing patterns in the inferior temporal (IT) cortex, highlighted here, correlate strongly with success in object-recognition tasks. Credit: MIT News.

Nikolaus Kriegeskorte, a principal investigator at the Medical Research Council Cognition and Brain Sciences Unit in Cambridge, U.K., agrees that the study offers “crucial evidence supporting the idea that inferior temporal cortex contains the neuronal representations underlying human visual object recognition.”

“This study is exemplary for its original and rigorous method of establishing links between brain representations and human behavioral performance,” adds Kriegeskorte, who was not part of the research team.

Model performance

The researchers also tested more than 10,000 other possible models for how the brain might encode object representations. These models varied based on location in the brain, the number of neurons required, and the time window for neural activity.

Some of these models, including some that relied on V4, were eliminated because they performed better than humans on some tasks and worse on others.
“We wanted the performance of the neurons to perfectly match the performance of the humans in terms of the pattern, so the easy tasks would be easy for the neural population and the hard tasks would be hard for the neural population,” Majaj says.

The research team now aims to gather even more data to ask if this model or similar models can predict the behavioral difficulty of object recognition on each and every visual image — an even higher bar than the one tested thus far. That might require additional factors to be included in the model that were not needed in this study, and thus could expose important gaps in scientists’ current understanding of neural representations of objects.

They also plan to expand the model so they can predict responses in IT based on input from earlier parts of the visual stream.

“We can start building a cascade of computational operations that take you from an image on the retina slowly through V1, V2, V4, until we’re able to predict the population in IT,” Majaj says.

About this neurology research

Source: Anne Trafton – MIT
Image Credit: The image is credited ti MIT News
Original Research: Abstract for “Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance” by Najib J. Majaj, Ha Hong, Ethan A. Solomon, and James J. DiCarlo in Journal of Neuroscience. Published online September 30 2015 doi:10.1523/JNEUROSCI.5181-14.2015

Abstract

Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance

To go beyond qualitative models of the biological substrate of object recognition, we ask: can a single ventral stream neuronal linking hypothesis quantitatively account for core object recognition performance over a broad range of tasks? We measured human performance in 64 object recognition tests using thousands of challenging images that explore shape similarity and identity preserving object variation. We then used multielectrode arrays to measure neuronal population responses to those same images in visual areas V4 and inferior temporal (IT) cortex of monkeys and simulated V1 population responses. We tested leading candidate linking hypotheses and control hypotheses, each postulating how ventral stream neuronal responses underlie object recognition behavior. Specifically, for each hypothesis, we computed the predicted performance on the 64 tests and compared it with the measured pattern of human performance. All tested hypotheses based on low- and mid-level visually evoked activity (pixels, V1, and V4) were very poor predictors of the human behavioral pattern. However, simple learned weighted sums of distributed average IT firing rates exactly predicted the behavioral pattern. More elaborate linking hypotheses relying on IT trial-by-trial correlational structure, finer IT temporal codes, or ones that strictly respect the known spatial substructures of IT (“face patches”) did not improve predictive power. Although these results do not reject those more elaborate hypotheses, they suggest a simple, sufficient quantitative model: each object recognition task is learned from the spatially distributed mean firing rates (100 ms) of ∼60,000 IT neurons and is executed as a simple weighted sum of those firing rates.

SIGNIFICANCE STATEMENT We sought to go beyond qualitative models of visual object recognition and determine whether a single neuronal linking hypothesis can quantitatively account for core object recognition behavior. To achieve this, we designed a database of images for evaluating object recognition performance. We used multielectrode arrays to characterize hundreds of neurons in the visual ventral stream of nonhuman primates and measured the object recognition performance of >100 human observers. Remarkably, we found that simple learned weighted sums of firing rates of neurons in monkey inferior temporal (IT) cortex accurately predicted human performance. Although previous work led us to expect that IT would outperform V4, we were surprised by the quantitative precision with which simple IT-based linking hypotheses accounted for human behavior.

“Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance” by Najib J. Majaj, Ha Hong, Ethan A. Solomon, and James J. DiCarlo in Journal of Neuroscience. Published online September 30 2015 doi:10.1523/JNEUROSCI.5181-14.2015

Feel free to share this neuroscience news.