Summary: A new study identifies a neural mechanism that allows the brain to compensate for speech sounds when they are obscured by noise.
Humans are exquisitely skilled at perceiving spoken words, even when speakers’ voices are intermittently overwhelmed by noise, as happens in the din of construction sites or on busy urban streets. Now, in a study conducted in a group of patients preparing for brain surgery, UC San Francisco scientists have discovered an unexpected mechanism the brain uses to seamlessly compensate when speech sounds are obscured by noise.
The research team monitored neural activity during listening tasks in a group of epilepsy patients awaiting surgery, using recording devices placed directly on the surface of the brain. As reported in the Dec. 20, 2016, issue of Nature Communications, the resulting neural recordings captured the real-time dynamics of this perceptual “filling in,” which takes just tenths of a second, and also showed that a region outside the brain’s canonical speech areas plays a critical role in this process.
The group found that the part of the brain most deeply involved in speech perception responded to missing speech sounds as if those sounds were actually present. But most intriguingly, the researchers discovered that a brain region separate from main speech-processing areas somehow “predicts” which word a listener will hear when that word is partially masked by noise, well before that noise has even begun to be processed by auditory areas.
In the new research, Matthew Leonard, PhD, assistant professor of neurological surgery and a member of the UCSF Weill Institute for Neurosciences, and colleagues worked with five patients about to undergo surgery to treat epilepsy that was not manageable with medications.
Devices Placed Directly On the Brain
To locate the anatomical origins of these patients’ seizures for surgery, and to create surgical plans that would protect crucial brain areas, flexible panels containing 256 recording electrodes had been placed on the surface of either the right or left side of the brain. These electrode arrays provided dense coverage of a region known as the superior temporal gyrus (STG), which is crucial to speech processing, a recording arrangement that has proven valuable in previous research on speech in the UCSF laboratory of neurosurgeon-scientist and senior author Edward Chang, MD, professor of neurological surgery.
It has been known since the 1970s that when critical speech sounds that distinguish one word from another – the “s” and “k” sounds that distinguish faster from factor, for example – are excised and replaced by noise (such a stimulus can be represented as “fa#tor”), listeners will nonetheless report hearing a complete word, a phenomenon called “phoneme restoration.”
According to Leonard, for stimuli like fa#tor, where only two actual English words related to the stimulus exist, phoneme restoration is a “bistable” auditory illusion, somewhat analogous to well-known visual illusions like the “duck/rabbit” drawings that shift between two perceptual interpretations. When listeners hear fa#tor, they report hearing either faster or factor, even though neither word is truly present in the stimulus.
When the patients listened to various bistable stimuli, recordings from the STG were consistent with whichever word they reported hearing: if they perceived factor, for example, the part of the STG normally activated by “k” sounds emitted a signal, even though no “k” sound was actually present; likewise, when they perceived faster, the STG region corresponding to “s” sounds was activated.
A Perennial Question in Speech Perception
These responses occurred less than two tenths of a second after the noise-obscured gaps in the stimuli began to be processed – the same time frame as when the difference between the actual words faster or factor was processed – which provides the beginnings of an answer to a perennial question in speech perception, Leonard said.
“One of the oldest debates in the field is whether there’s a ‘top-down’ signal that actually changes the listener’s perception ‘online,’ in real time, or whether this is achieved by some sort of decision-making process that rapidly arrives at an interpretation after the missing sound segment has been processed,” Leonard said. “Our data seem to support the former idea.”
Surprisingly, the patients’ word choices were unaffected when noise-masked words were embedded in sentences that would seem to strongly favor one choice over another, a technique called “semantic priming.” For example, hearing “On the highway, he drove the car much fa#ter” would seem to bias a listener toward hearing “faster,” but the researchers found that the patients were just as likely to say they heard “factor.”
Since phoneme restoration was essentially instantaneous, and because semantic priming had so little effect, the research team wondered whether brain areas other than the STG might somehow be contributing to the listeners’ perception.
The group was surprised to find that an area toward the front of the brain was selectively active about half a second before the STG signals associated with phoneme restoration were seen. This activity actually predicted which word patients would report hearing, suggesting that this region somehow helped to drive that perception.
“Whether you hear a bistable stimulus as factor or faster on a given trial seems to depend on random fluctuations in the brain’s state at that moment, something you really don’t have any control over,” Leonard said. “We don’t have a definitive idea of what this frontal signal is yet, but we’ll be exploring that question in future research.”
Taken together, said Leonard, the new results show that “there are brain mechanisms that are constantly working behind the scenes to make sure we don’t get tripped up every time there’s a sound that could prevent us from understanding speech.”
Leonard and Chang were joined in the research by Maxime Baud, MD, PhD, a postdoctoral fellow in the Department of Neurology working in the Chang laboratory, and Matthias J. Sjerps, PhD, a postdoctoral fellow at Radboud University in Nijmegen, The Netherlands.
Funding: The work was funded by the National Institutes of Health; the Kavli Institute for Brain and Mind; the European Commission; the New York Stem Cell Foundation; the McKnight Foundation; the Shurl and Kay Curci Foundation; and the William K. Bowes, Jr. Foundation.
Source: Andrew Tobin – UCSF
Image Source: NeuroscienceNews.com image is adapted from the UCSF video.
Video Source: The video is credited to UCSF.
Original Research: Full open access research for “Perceptual restoration of masked speech in human cortex” by Matthew K. Leonard, Maxime O. Baud, Matthias J. Sjerps and Edward F. Chang in Nature Communications. Published online December 20 2016 doi:10.1038/ncomms13619
Perceptual restoration of masked speech in human cortex
Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction.
“Perceptual restoration of masked speech in human cortex” by Matthew K. Leonard, Maxime O. Baud, Matthias J. Sjerps and Edward F. Chang in Nature Communications. Published online December 20 2016 doi:10.1038/ncomms13619