Biological Brains Outpace AI in Learning, Thanks to Structured Exploration

Summary: When it comes to rapid learning, the biological brain seems to have an edge over current AI technology.

Scientists worked with different models of reinforcement learning to understand the algorithms that the brain uses to learn. They found that the directed exploration used by animals makes learning more efficient and requires less experience, unlike artificial agents that explore randomly.

Key Facts:

  1. The study suggests that animals’ directed exploration makes learning more efficient, which could help build better AI agents that can learn faster and require less experience.
  2. AI agents need a lot of experience to learn something and explore the environment thousands of times, whereas a real animal can learn an environment in less than ten minutes.
  3. The research findings emphasize the need for more efficient learning algorithms that can explain the behavior of animals in exploring and learning their spatial environment.

Source: Sainsbury Wellcome Center

Neuroscientists have uncovered how exploratory actions enable animals to learn their spatial environment more efficiently. Their findings could help build better AI agents that can learn faster and require less experience.

Researchers at the Sainsbury Wellcome Centre and Gatsby Computational Neuroscience Unit at UCL found the instinctual exploratory runs that animals carry out are not random. These purposeful actions allow mice to learn a map of the world efficiently.

The study, published today in Neuron, describes how neuroscientists tested their hypothesis that the specific exploratory actions that animals undertake, such as darting quickly towards objects, are important in helping them learn how to navigate their environment.

“There are a lot of theories in psychology about how performing certain actions facilitates learning. In this study, we tested whether simply observing obstacles in an environment was enough to learn about them, or if purposeful, sensory-guided actions help animals build a cognitive map of the world,” said Professor Tiago Branco, Group Leader at the Sainsbury Wellcome Centre and corresponding author on the paper.

In previous work, scientists at SWC observed a correlation between how well animals learn to go around an obstacle and the number of times they had run to the object. In this study, Philip Shamash, SWC PhD student and first author of the paper, carried out experiments to test the impact of preventing animals from performing exploratory runs.

By expressing a light-activated protein called channelrhodopsin in one part of the motor cortex, Philip was able to use optogenetic tools to prevent animals from initiating exploratory runs towards obstacles.

The team found that even though mice had spent a lot of time observing and sniffing obstacles, if they were prevented in running towards them, they did not learn. This shows that the instinctive exploratory actions themselves are helping the animals learn a map of their environment.

To explore the algorithms that the brain might be using to learn, the team worked with Sebastian Lee, a PhD student in Andrew Saxe’s lab at SWC, to run different models of reinforcement learning that people have developed for artificial agents, and observe which one most closely reproduces the mouse behaviour.

This shows a brain
There are two main classes of reinforcement learning models: model-free and model-based. Credit: Neuroscience News

There are two main classes of reinforcement learning models: model-free and model-based. The team found that under some conditions mice act in a model-free way but under other conditions, they seem to have a model of the world. And so the researchers implemented an agent that can arbitrate between model-free and model-based. This is not necessarily how the mouse brain works, but it helped them to understand what is required in a learning algorithm to explain the behaviour.

“One of the problems with artificial intelligence is that agents need a lot of experience in order to learn something. They have to explore the environment thousands of times, whereas a real animal can learn an environment in less than ten minutes.

“We think this is in part because, unlike artificial agents, animals’ exploration is not random and instead focuses on salient objects. This kind of directed exploration makes the learning more efficient and so they need less experience to learn,” explains Professor Branco.   

The next steps for the researchers are to explore the link between the execution of exploratory actions and the representation of subgoals. The team are now carrying out recordings in the brain to discover which areas are involved in representing subgoals and how the exploratory actions lead to the formation of the representations.

Funding: This research was funded by a Wellcome Senior Research Fellowship (214352/Z/18/Z) and by the Sainsbury Wellcome Centre Core Grant from the Gatsby Charitable Foundation and Wellcome (090843/F/09/Z), the Sainsbury Wellcome Centre PhD Programme and a Sir Henry Dale Fellowship from the Wellcome Trust and Royal Society (216386/Z/19/Z).

About this AI and neuroscience research news

Author: April Cashin-Garbutt
Source: Sainsbury Wellcome Center
Contact: April Cashin-Garbutt – Sainsbury Wellcome Center
Image: The image is credited to Neuroscience News

Original Research: Open access.
Mice identify subgoal locations through an action-driven mapping process” by Sebastian Lee et al. Neuron


Mice identify subgoal locations through an action-driven mapping process


  • Interruption of obstacle-directed runs during exploration prevents subgoal learning
  • Subgoal selection during escape depends on the mouse’s position in the environment
  • A dual-system reinforcement learning agent replicates the mouse behavior


Mammals form mental maps of the environments by exploring their surroundings. Here, we investigate which elements of exploration are important for this process. We studied mouse escape behavior, in which mice are known to memorize subgoal locations—obstacle edges—to execute efficient escape routes to shelter.

To test the role of exploratory actions, we developed closed-loop neural-stimulation protocols for interrupting various actions while mice explored. We found that blocking running movements directed at obstacle edges prevented subgoal learning; however, blocking several control movements had no effect.

Reinforcement learning simulations and analysis of spatial data show that artificial agents can match these results if they have a region-level spatial representation and explore with object-directed movements. We conclude that mice employ an action-driven process for integrating subgoals into a hierarchical cognitive map.

These findings broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.