Large Rewards Accelerate Learning Speed by Extending Brain Signals

Summary: A paradigm-shifting study has upended a decades-long neurological assumption that learning speed depends entirely on repetition and experience rather than the size of a reward.

The research demonstrates that larger jackpots trigger higher-volume, longer-lasting dopamine signals in the brain. This prolonged chemical wave boosts individual engagement and drastically compresses training timelines, proving that a few high-value rewards can teach a complex skill faster than thousands of minor repetitions.

Key Facts

Upending the Repetition Myth: For decades, neuroscience operated on the belief that skill acquisition requires hundreds of uniform, small-reward repetitions to slowly cement behavior, regardless of the prize’s actual value.
The Cookie vs. M&M Effect: When tested, thirsty mice rewarded with a few large drinks of water mastered a task in a single day after fewer than 10 rewards. Conversely, mice given thousands of tiny, incremental sips took weeks to achieve the same proficiency.
Crushing Individual Variability: Under standard small-reward protocols, learning rates vary wildly between subjects, some mastering a task in a week, others taking a month. Large rewards eliminated this gap, bringing all subjects to expert level in just a few days.
The Extended Dopamine Wave: Bigger rewards don’t just produce a larger spike in dopamine; they fundamentally alter the timeline by keeping the dopamine signal active for a longer duration.
The Engagement Catalyst: The study isolated three distinct learning components driven by large rewards: increased retention per repetition, superior day-to-day memory carryover, and heightened active engagement. Task engagement emerged as the primary factor dictating individual learning speed.
Expanding Primate-Level Complexity to Rodents: By drastically shortening training times and maximizing engagement, this protocol allows researchers to train mice in hyper-complex cognitive tasks that were previously considered completely beyond their reach.

Source: HHMI

Scientists long assumed that learning speed depends primarily on our experience — how many times we try and succeed — not the size of the reward. We become better at poker because we keep playing and winning, regardless of the purse being $100 or $100 million.

But new research suggests that the size of the jackpot matters more than previously thought.

Scientists in the Dudman Lab at HHMI’s Janelia Research Campus show that bigger rewards can enable learning to happen faster.

The new findings upend decades-long assumptions that learning depends on experience and the role dopamine plays in the process.

How Reward Size Affects Learning Speed

Like every other neuroscience lab, the Dudman Lab had always assumed that animals learn slowly, and they need hundreds of repetitions, each with a small reward, to learn even simple tasks. Neuroscientists had never thought to examine whether the size of the reward might affect learning.

“The whole field has been doing it for decades and I mean this quite literally, no one ever checked,” says Janelia Senior Group Leader Josh Dudman.

When the team decided to check this assumption, the results were striking. Thirsty mice that were given a few large drinks of water as the reward for completing a task learned much faster than mice rewarded with many small sips — the difference between giving a human a cookie and a single M&M. Instead of taking many days to learn the task using thousands of little rewards, the animals learned the task in one day after receiving fewer than 10 large rewards.

Surprisingly, even though the animals had less experience with the task, the variability between animals also declined dramatically. Normally, one mouse might become an expert in a week while another took a month to learn the same task. With the bigger reward, all the animals were learning the task in a few days.

“As neuroscientists, we resign ourselves to knowing that we’re going to have to train this animal for a few weeks and eventually, they’re going to start to look like they know what’s up,” Luke Coddington, a senior scientist in the Dudman Lab who led the new study, says. “But instead, now in a day, I’m watching these mice just nail it.”

How Dopamine Controls Learning Speed

The researchers found that large rewards increased three components that contribute to how fast animals learn:

how much they learn from each repetition
how well they carry over what they’ve learned from day to day
how engaged they are throughout each learning session

Compared to smaller rewards, bigger rewards produced larger increases in dopamine — a chemical messenger in the brain that helps regulate learning and motivation. Importantly, the team also found that the dopamine signals associated with the bigger rewards lasted longer. When they artificially extended the dopamine signals associated with small rewards, they found learning also happened faster.

The team found that the longer dopamine signal led the animals to learn more during each trial and stay more engaged in the task, which led to faster learning.

The level of engagement in the task was also the largest determinant of individual variations in learning.

“We think that when we make dopamine responses way bigger in these experiments, we’re turning all the ‘kids’ in our ‘classroom’ into really engaged students,” Coddington says.

Implications for Neuroscience Research

The new work could change how neuroscientists study skill-based learning. Using large rewards cuts training time and variability, making the learning process easier to study.

The Dudman Lab is already using large rewards in their work. “It changed how more or less all of our current projects are done now,” Dudman says.

It also shows that mice could potentially be trained in more complex tasks than previously thought, empowering researchers to study questions about learning and cognition that were previously out of reach.

“In addition to the practical side, which is very real, we may also end up studying new aspects of cognition we didn’t realize we could study in a mouse,” Coddington says. “If we can properly engage them in the task, then who knows what they can learn.”

Key Questions Answered:

Q: Why does the size of a prize change how fast the brain physically builds a new skill?

A: It changes the behavior of dopamine, the brain’s primary learning and motivation chemical. A tiny reward creates a short, brief flash of dopamine. A massive jackpot, however, forces the dopamine signal to stay active and linger in the brain for a significantly longer duration. This extended presence essentially commands the brain to lock in the memory of the successful action immediately.

Q: If engagement is the secret to learning, how does a big reward act like a great classroom teacher?

A: The researchers found that artificially extending the dopamine signal transforms the subjects’ focus. In a traditional setup, individual mice drift off, lose focus, or learn at wildly different speeds. The sustained dopamine wave triggered by a large prize acts like a master teacher, turning every distracted student in the classroom into a highly engaged, hyper-focused learner.

Q: How does this discovery change the day-to-day operations of an active neuroscience laboratory?

A: It completely slashes training overhead and research timelines. Instead of losing weeks or months trying to teach basic behavioral baselines to subjects, labs can now achieve perfect mastery in less than 48 hours. This efficiency frees up resources to study advanced, complex cognitive aspects of intelligence that were once out of reach.

Editorial Notes:

This article was edited by a Neuroscience News editor.
Journal paper reviewed in full.
Additional context added by our staff.

About this learning and neuroscience research news

Author: Halea Kerr-Layton
Source: HHMI
Contact: Halea Kerr-Layton – HHMI
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Reward magnitude determines reinforcement learning efficiency” by Sheng Gong, Alyssa Martell, Joshua T. Dudman, and Luke T. Coddington. Science
DOI:10.1126/science.aeb0813

Abstract

Reward magnitude determines reinforcement learning efficiency

INTRODUCTION

Across different disciplines that share an interest in learning, from artifical intelligence (AI) to experimental psychology, it has long been assumed that there is a free parameter, the learning rate, that determines individual variance in learning efficiency and is relatively independent of the magnitude of reward. This suggests that learning depends primarily on the amount of experience (number of rewards).

However, recent theoretical work mapping dopamine (DA) function onto reinforcement learning algorithms, combined with classic results on DA encoding of reward, suggested that learning rates might in fact depend upon reward magnitude.

This also raises the possibility that, as a field, we may have settled on suboptimal reward magnitude distributions that slow training in complex laboratory tasks and also underestimated the efficiency of animal learning.

RATIONALE

An influential set of observations led to the hypothesis that DA neuron activity implements the reward prediction error component of reinforcement learning algorithms.

However, recent work has proposed that DA activity may map onto the learning rate during acquisition. The learning rate parameter, as the name implies, determines how fast learning converges to its asymptote.

Classic experimental results demonstrated that DA activity is correlated with reward magnitude. Together, these two points imply an unexpected hypothesis: Reward magnitude could determine the efficiency of reinforcement learning. There are few data on what magnitude of reward is optimal for learning in any laboratory animal.

This is especially true for the range of navigation, motor skill, and decision-making tasks typical of modern systems neuroscience experiments in mice. Nonetheless, essentially the entire field uses reward magnitudes from within a very small range.

Those chosen reward magnitudes are quite small relative to the daily needs of a mouse (<1%). Thus, we set out to determine whether, and if so why, increases in reward magnitude could increase the efficiency of animal learning.

RESULTS

Increasing reward magnitude by one to two orders relative to the standard reward sizes used in the field substantially increased the efficiency of learning across a range of tasks.

We found that mice could learn from at least an order of magnitude fewer trials in a hidden target navigation task, an effort-based reach-to-pull motor skill task, and a sensorimotor decision-making task. In general, across all three tasks, the efficiency of learning was increased without a notable change in the quality of the final, trained performance.

At the upper limit, these effects could be substantial. For example, some mice learned a hidden target navigation task in only a few experiences of reinforcement, something that requires hundreds or thousands of reinforcements using standard reward magnitudes.

We further showed that these effects could be well explained once one appreciates that the efficiency of learning is determined by three critical components: (i) the learning rate, (ii) the ability to capture learned improvements from prior sessions, and (iii) the extent of sustained engagement in a task. In our study, large rewards improved all three aspects. Large rewards produced longer, more sustained activity of DA neurons during reward consumption.

We tested whether augmenting normal responses to reward with optogenetic-mediated sustained activation of DA were sufficient to enhance learning efficiency with standard reward magnitudes. Sustained optogenetic “boosting” of DA reward responses was able to increase learning efficiency in both hidden target navigation and the effort-based motor skill task.

DA stimulation increased learning efficiency by increasing the learning rate and reducing disengagement, but failed to enhance capture of prior learning. Finally, we showed that increasing reward magnitude, while always improving learning as measured in DA activity, does not always lead to obvious improvements in behavioral measures of learning. For example, the presence of large rewards appears to interfere with anticipatory behavior in classical conditioning paradigms.

CONCLUSION

We found that larger reward magnitudes than used in the field could indeed enhance the learning efficiency of mice across a range of complex tasks, including navigation, motor skill, and decision-making. One of the largest sources of variance across individual mice was the ability to stay engaged in task performance. Unexpectedly, variance in learning rate across individuals appeared to be much smaller.

As a result, large rewards could substantially attenuate variance across individuals in learning efficiency. Finally, mesolimbic DA neuron activity could produce multiple effects on learning depending upon the magnitude and time course of DA activation.