Inverse Graphics: How Your Brain Turns 2D Into 3D

Summary: Researchers have uncovered how primate brains transform flat, 2D visual inputs into rich, 3D mental representations of objects. This process, dubbed “inverse graphics,” works by reversing the principles of computer graphics — starting from a 2D view, through an intermediate stage, to a 3D model.

Using a neural network called the Body Inference Network, scientists mapped this process and showed it closely mirrors activity in primate brain regions responsible for body shape recognition. The findings shed light on how humans perceive depth and could inspire advances in AI and treatments for visual disorders.

Key Facts:

The primate inferotemporal cortex builds 3D mental models from 2D images via an “inverse graphics” process.
Researchers used a neural network to replicate and map this process to brain activity in macaques.
The work could inform machine vision design and aid in understanding visual perception disorders.

Source: Yale

Yale researchers have discovered a process in the primate brain that sheds new light on how visual systems work and could lead to advances in both human neuroscience and artificial intelligence.

Working with a new computational model, researchers uncovered an algorithm that reveals how the primate brain constructs internal three-dimensional (3D) representations of an object when viewing a two-dimensional (2D) image of that object.

This shows a brain. — In their work, researchers found that part of the temporal lobe of the primate brain — specifically, the inferotemporal cortex, an area critical for visual processing — transforms images into 3D mental models of objects. Credit: Neuroscience News

“This gives us evidence that the goal of vision is to establish a 3D understanding of an object,” said study senior author Ilker Yildirim, an assistant professor of psychology in Yale’s Faculty of Arts and Sciences.

“When you open your eyes, you see 3D scenes — the brain’s visual system is able to construct a 3D understanding from a stripped-down 2D view.”

Researchers have dubbed this process “inverse graphics,” describing how the brain’s visual processing system works like a computer graphics process, but in reverse, from a 2D image through a less view-dependent “2.5D” intermediate representation, and up to a much more view-tolerant 3D object.

The findings were published in the Proceedings of the National Academy of Sciences.

A human brain, in essence, transforms 2D images that one sees — perhaps on paper or on a screen — into 3D mental models. Computer graphics, meanwhile, do the opposite, rendering 3D scenes into 2D images.

“This is a significant advance in understanding computational vision,” Yildirim said. “Your brain automatically does this, and it’s hard work, computationally. It remains a challenge to get machine vision systems to come close to doing this for the everyday scenes we can encounter.”

The finding could fuel research in human neuroscience and vision disorders, as well as advance the creation of machine vision systems with primate vision capabilities, researchers say.

In their work, researchers found that part of the temporal lobe of the primate brain — specifically, the inferotemporal cortex, an area critical for visual processing — transforms images into 3D mental models of objects.

They did this by deploying what is known as a Body Inference Network (BIN), a neural network-based model able to create a 2D representation of an object based on properties of shape, posture, and orientation.

But in this case, researchers trained BIN to invert this process, training it to construct 3D human and monkey bodies from images (labeled with 3D data) directly. With this input, BIN was shown to reverse the usual computer graphics process, arriving at 3D properties derived from the 2D images.

After comparing this BIN data with brain data recorded in macaques as they were shown macaque body images, the researchers found that BIN’s processing stages matched activity in the two regions of the macaque brain (MSB and ASB) involved with processing body shapes.

“Our model explained the visual processing in the brain much more closely than other AI models typically do,” Yildirim said.

“We are most interested in the neuroscience and cognitive science aspects of this, but also with the hope that this can help inspire new machine vision systems and facilitate possible medical interventions in the future.”

Other authors of the study included first author Hakan Yilmaz and Aalap Shah, who are both Ph.D. candidates in Yale’s Graduate School of Arts and Sciences, and researchers from Princeton University and KU Leuven in Belgium.

About this visual neuroscience research news

Author: Bess Connolly
Source: Yale
Contact: Bess Connolly – Yale
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics” by Ilker Yildirim et al. PNAS

Abstract

Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics

Stimulus-driven, multiarea processing in the inferotemporal (IT) cortex is thought to be critical for transforming sensory inputs into useful representations of the world.

What are the formats of these neural representations and how are they computed across the nodes of the IT networks?

A growing literature in computational neuroscience focuses on the computational-level objective of acquiring high-level image statistics that supports useful distinctions, including between object identities or categories.

Here, inspired by classic theories of vision, we suggest an alternative possibility. We show that inferring 3D objects may be a distinct computational-level objective of IT, implemented via an algorithm analogous to graphics-based generative models of how 3D scenes form and project to images, but in the reverse order.

Using perception of bodies as a case study, we show that inverse graphics spontaneously emerges in inference networks trained to map images to 3D objects. Remarkably, this correspondence to the reverse of a graphics-based generative model also holds across the body processing network of the macaque IT cortex.

Finally, inference networks recapitulate the feedforward progression across the stages of this IT network and do so better than the currently dominant vision models, including both supervised and unsupervised variants, none of which aligns with the reverse of graphics.

This work suggests inverse graphics as a multiarea neural algorithm implemented within IT, and points to ways for replicating primate vision capabilities in machines.