Creatures of Habit: How Neurons Weigh Behavioral Cost and Reward

We are creatures of habit, nearly mindlessly executing routine after routine. Some habits we feel good about; others, less so. Habits are, after all, thought to be driven by reward-seeking mechanisms that are built into the brain. It turns out, however, that the brain’s habit-forming circuits may also be wired for efficiency.

New research from MIT shows that habit formation, at least in primates, is driven by neurons that represent the cost of a habit, as well as the reward. “The brain seems to be wired to seek some near optimality of cost and benefit,” says Ann Graybiel, an Institute Professor at MIT and also a member of the McGovern Institute for Brain Research.

This study is the first to show that cost considerations are wired into the learning of habits. The findings, which appear this week in the journal Neuron, could also provide insights into neuropsychiatric disorders that involve problems with repetitive behavior, such as Parkinson’s disease, Huntington’s disease, obsessive-compulsive disorder, Tourette syndrome, and autism spectrum disorder.

The anatomy of a habit

Previous work by Graybiel and her colleagues discovered clear beginning and ending signals in the brain when habits are performed. These signals appear in the striatum, a part of the brain that, among other things, coordinates body movements; the signals have been observed in mice, rats, and monkeys that have been trained to perform specific tasks.

A few years ago, Graybiel and Theresa Desrochers, then a doctoral student in her lab, decided to let two monkeys learn a habit on their own, without training, as a way to mimic real-life learning. They also recorded the activity of 1,600 neurons in the striatum during the learning period.

The primates learned, over several months, to visually navigate a grid of dots on a screen in search of a randomly selected one that has been “baited,” meaning that the monkey will receive a squirt of juice when its eyes pass through it. When the monkey’s eyes land on the “baited” dot, the color of the grid of dots changes, indicating that a reward is coming.

Over time the monkey’s eyes followed the same path repeatedly, suggesting that the eye movements had became habitual. “We allowed the animals to make their own habits the same way we make habits out in the world,” says Desrochers, who is now a postdoc at Brown University.

This shows lots of green, triangual people and one drawn in orange. — Graybiel and her colleagues previously discovered clear beginning and ending signals in the brain when habits are performed. These signals appear in the striatum, a part of the brain that, among other things, coordinates body movements. Image credit: Jose-Luis Olivares/MIT.

In addition, these habitual eye-scanning patterns became more efficient. The monkeys shortened the paths they used to visit the dots the same way a traveling salesman might improve his sales route. Graybiel and Desrochers published these findings about behavior in 2010.

Wire tap

This new paper reveals the findings of the analysis of the neural recordings captured as the monkeys learned the habit. Graybiel, Desrochers, and co-author Ken-ichi Amemori, a research scientist in Graybiel’s lab, observed the formation of clear beginning and ending signals at the boundaries of the habitual activity. In addition, over time, the ending signals changed dramatically.

The researchers zoomed in on these ending signals, a 400-millisecond window of time right after the animal has found the “baited” dot and just before the reward is delivered. During the early stages of learning, the signals are less precisely timed, firing throughout the time window. But as learning progresses, the neurons begin to fire at almost precisely the same time in that narrow window right after the monkey’s habit ends. “The signal is getting stronger and appears to be tied to the animal doing these patterns over and over, entrenching the pattern neurally as the animals picked up their habit,” Desrochers says.

To link the firing of these neurons to habit formation, the team compared the changes in neural activity with changes in behavior, finding that the two changed in parallel. The changes in firing of some neurons tracked with cost, measured in terms of the length of the path of the eye movements during a trial, while others correlated with reward.

Still others correlated with both cost and reward, and it was these neurons that sharpened their firing as the monkeys learned the habit and settled on a shorter, lower-cost eye movement pattern. “This strong correlation suggests that both reward and cost are represented in these neurons, and are driving the habit-forming behavior,” Desrochers says.

“To know there are other brain signals like cost hiding under the reward signal is very exciting,” says Yael Niv, an associate professor of psychology at Princeton University and an affiliate of the Princeton Neuroscience Institute who was not involved in this work. “This study suggests that we should not be blinded by reward. Reward is only one side of the coin. The other side is how much do you have to pay for it.”

Exactly how these neurons are driving behavior isn’t yet clear, since Graybiel and Desrochers don’t yet know what these particular neurons in the striatum are connected to. Graybiel speculates that the neurons are part of a larger circuit that reinforces learning of repetitive tasks, and possibly even repetitive thoughts, but more research is required to test this idea.

In addition, Graybiel is interested in understanding the role these signals might play in neuropsychiatric disorders. A first step will include identifying cells that represent cost and reward in mouse models of human neuropsychiatric disorders that have symptoms that involve repetitive behavior.

“We’re interested in repetitive behavior because our creative brain is resting on this giant glacier of habit. It’s this wonderful mechanism that frees us up,” Graybiel says. “But also it would be a dream to be able to learn more about diseases that involve unhealthy repetitive behavior by understanding the wiring and what can go wrong.”

About this neuroscience research

Additional Information: The 2010 research paper, “Optimal habits can develop spontaneously through sensitivity to local cost” by Theresa M. Desrochers, Dezhe Z. Jin, Noah D. Goodman, and Ann M. Graybiel was published in PNAS doi:10.1073/pnas.1013470107

Source: Elizabeth Dougherty – McGovern Institute for Brain Research/MIT
Image Credit: The image is credited to Jose-Luis Olivares/MIT
Original Research: Abstract for “Habit Learning by Naive Macaques Is Marked by Response Sharpening of Striatal Neurons Representing the Cost and Outcome of Acquired Action Sequences” by Theresa M. Desrochers, Ken-ichi Amemori, and Ann M. Graybiel in Neuron. Published online July 15 2015 doi:10.1016/j.neuron.2015.07.019

Abstract

Habit Learning by Naive Macaques Is Marked by Response Sharpening of Striatal Neurons Representing the Cost and Outcome of Acquired Action Sequences

Highlights
•During sequence learning, macaque striatal neurons encode integrated cost-benefit
•These signals mark ends of saccade sequences acquired without explicit training
•With learning, the cost-benefit end signals sharpen via population spike alignment
•This sharpening is tightly coupled to decreasing entropy of the sequences acquired

Summary
Over a century of scientific work has focused on defining the factors motivating behavioral learning. Observations in animals and humans trained on a wide range of tasks support reinforcement learning (RL) algorithms as accounting for the learning. Still unknown, however, are the signals that drive learning in naive, untrained subjects. Here, we capitalized on a sequential saccade task in which macaque monkeys acquired repetitive scanning sequences without instruction. We found that spike activity in the caudate nucleus after each trial corresponded to an integrated cost-benefit signal that was highly correlated with the degree of naturalistic untutored learning by the monkeys. Across learning, neurons encoding both cost and outcome gradually acquired increasingly sharp phasic trial-end responses that paralleled the development of the habit-like, repetitive saccade sequences. Our findings demonstrate an integrated cost-benefit signal by which RL and its neural correlates could drive naturalistic behaviors in freely behaving primates.

“Habit Learning by Naive Macaques Is Marked by Response Sharpening of Striatal Neurons Representing the Cost and Outcome of Acquired Action Sequences” by Theresa M. Desrochers, Ken-ichi Amemori, and Ann M. Graybiel in Neuron. Published online July 15 2015 doi:10.1016/j.neuron.2015.07.019

Feel free to share this neuroscience news.