Summary: Researchers have identified two distinct groups of neurons that help the brain evaluate risk and reward in decision-making. These neurons, located in the ventral striatum, separately process better-than-expected and worse-than-expected outcomes. In experiments with mice, silencing these neurons altered their anticipation of rewards, affecting their decision-making behavior.
The study suggests that the brain tracks a full range of possible rewards, rather than just an average, which aligns with machine-learning models of decision-making. If confirmed in humans, the findings could explain difficulties in assessing risks seen in conditions like depression and addiction. Future research will explore how uncertainty influences this brain circuitry.
Key Facts:
- Two Neural Groups: One group processes better-than-expected outcomes, while another tracks worse-than-expected ones.
- Decision-Making Mechanism: The brain represents a full spectrum of possible rewards, not just averages.
- Potential Clinical Impact: Findings could help explain impaired risk assessment in conditions like depression and addiction.
Source: Harvard
Every day, our brain makes thousands of decisions, big and small. Any of these decisions — from the least consequential such as picking a restaurant to the more important such as pursuing a different career or moving to a new city — may result in better or worse outcomes.
How does the brain gauge risk and reward in making these calls? The answer to this question continues to puzzle scientists, but a new study carried out by researchers at Harvard Medical School and Harvard University offers intriguing clues.

The research, published Feb. 19 in Nature and supported in part by federal funding, incorporated machine-learning concepts into mouse experiments to study the brain circuitry that supports reward-based decisions.
The scientists uncovered two groups of brain cells in mice: one that helps mice learn about above-average outcomes and another associated with below-average outcomes. Together, the experiments showed, these cells allow the brain to gauge the full range of possible rewards associated with a choice.
“Our results suggest that mice — and by extension, other mammals — seem to be representing more fine-grained details about risk and reward than we thought before,” said co-senior author Jan Drugowitsch, associate professor of neurobiology in the Blavatnik Institute at Harvard Medical School.
If confirmed in humans, the findings could provide a framework for understanding how the human brain makes reward-based decisions and what happens to the ability to judge risk and reward when reward circuitry fails.
Machine learning illuminates reward-based decisions
Neuroscientists have long been interested in how the brain uses past experiences to make new decisions. However, according to Drugowitsch, many traditional theories about such decision-making fail to capture the complexity and nuance of real-world behavior.
Drugowitsch uses the example of selecting a restaurant: If you’re in the mood to play it safe, you might choose a restaurant with a menu that experience tells you is reliably good, and if you feel like taking a risk, you might opt for a restaurant that you know offers a mix of exceptional and subpar dishes.
In the above example, the restaurants differ considerably in their range of offerings, yet existing neuroscience theories consider them equivalent when averaged, and thus predict an equal likelihood of choosing either.
“We know that this is not how humans and animals act — we can decide between seeking risks and playing it safe,” Drugowitsch said. “We have a sense of more than just average expected rewards associated with our choices.”
In recent years, machine-learning researchers developed a theory of decision-making that better captures the full range of potential rewards linked to a choice.
They incorporated this theory into a new machine-learning algorithm that outperformed alternative algorithms in Atari video games and a range of other tasks in which each decision has multiple possible outcomes.
“They basically asked what happens if rather than just learning average rewards for certain actions, the algorithm learns the whole distribution, and they found it improved performance significantly,” Drugowitsch said.
In a 2020 Nature paper, Naoshige Uchida, professor of molecular and cellular biology at Harvard University, and colleagues reanalyzed existing data to explore whether this machine-learning theory applied to neuroscience, in the context of decision-making in rodent brains.
The analysis showed that in mice, activity of the neurotransmitter dopamine — which plays a role in reward-seeking, pleasure, and motivation — corresponded to reward-learning signals predicted by the algorithm.
In other words, Drugowitsch said, the work suggested that the new algorithm was better at explaining dopamine activity.
How mouse brains represent a range of rewards
In the new study, Drugowitsch teamed up with co-senior author Uchida to take the research a step further. Together, they designed mouse experiments to see how this process plays out in a brain region called the ventral striatum, which stores information about possible rewards associated with a decision.
“Dopamine activity only provides the learning signal for expected rewards, but we wanted to find representations of these learned rewards directly in the brain,” Drugowitsch said.
The researchers trained mice to associate different odors with rewards of varying magnitudes — in essence, teaching mice the range of possible outcomes of a choice. They then presented the mice with odors, and observed licking behavior (mice lick more in anticipation of better rewards) while recording neural activity in the ventral striatum.
The team identified two distinct groups of neurons in the brain: One that helps a mouse learn about better-than-expected outcomes and another tied to worse-than-expected outcomes.
“You can think of this as having an optimist and a pessimist in your brain, both giving you advice on what to do next,” Drugowitsch explained.
When the researchers silenced the “optimistic” neurons, the mouse exhibited behavior suggesting that it anticipated a less appealing reward. Conversely, when the researchers silenced the “pessimistic” neurons, the mouse behaved as if it expected a higher value treat.
“These two groups of brain cells work together to form a representation of the full distribution of potential rewards for a decision,” Drugowitsch said.
The researchers see many future directions for their work, including how the brain makes decisions when there is more uncertainty about what each initial option represents and how their findings apply to more general reasoning about the world.
Drugowitsch noted that more research is needed to confirm the results in humans and to adapt the findings to the complexity of human decision-making. However, based on the parallels between mouse and human brains, he believes the work may already shed some light on how humans assess risk in decisions and why people with certain conditions such as depression or addiction may struggle with such assessments.
Authorship, funding, disclosures
Additional authors on the paper include Adam Lowet, Qiao Zheng, Melissa Meng, and Sara Matias.
Funding: The study was funded by the National Institutes of Health (R01NS116753; F31NS124095), the Human Frontier Science Program (LT000801/2018), the Harvard Brain Science Initiative, and the Brain & Behavior Research Foundation.
About this decision-making and neuroscience research news
Author: Dennis Nealon
Source: Harvard
Contact: Dennis Nealon – Harvard
Image: The image is credited to Neuroscience News
Original Research: Closed access.
“An opponent striatal circuit for distributional reinforcement learning” by Jan Drugowitsch et al. Nature
Abstract
An opponent striatal circuit for distributional reinforcement learning
Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards—an approach known as distributional reinforcement learning (RL).
The mesolimbic dopamine system is thought to underlie RL in the mammalian brain by updating a representation of mean value in the striatum, but little is known about whether, where and how neurons in this circuit encode information about higher-order moments of reward distributions.
Here, to fill this gap, we used high-density probes (Neuropixels) to record striatal activity from mice performing a classical conditioning task in which reward mean, reward variance and stimulus identity were independently manipulated.
In contrast to traditional RL accounts, we found robust evidence for abstract encoding of variance in the striatum. Chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding.
Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons—D1 and D2—contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively.
We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 medium spiny neurons to reap the computational benefits of distributional RL.