Brain Mastery Over Echoes Enhances Speech Clarity

Summary: Researchers have uncovered the human brain’s remarkable ability to segregate direct speech from echoes, a challenge that has long perplexed audio engineers.

It was discovered through magnetoencephalography (MEG) that despite the distortion caused by echoes, which typically lag by at least 100 milliseconds, the brain processes and understands speech with over 95% accuracy by separating the sound into distinct streams. This separation occurs even without active attention from the listener, indicating an innate brain function facilitating clear speech perception in echoic environments.

The findings not only shed light on auditory stream segregation’s role in complex acoustic settings but also hint at potential advancements in automatic speech recognition technologies.

Key Facts:

  1. The human brain can distinguish between direct speech and its echo, enabling high speech comprehension even in echoic environments.
  2. Neural activity during speech perception is better explained by a model that processes direct speech and echoes as separate streams, not by adaptation.
  3. This auditory segregation occurs automatically, without the need for the listener’s focused attention, showcasing an inherent brain capability.

Source: PLOS

Echoes can make speech harder to understand, and tuning out echoes in an audio recording is a notoriously difficulty engineering problem.

The human brain, however, appears to solve the problem successfully by separating the sound into direct speech and its echo, according to a study publishing February 15th in the open-access journal PLOS Biology by Jiaxin Gao from Zhejiang University, China, and colleagues.

This shows a woman.
The researchers state that auditory stream segregation may be important both for singling out a specific speaker in a crowded environment, and for clearly understanding an individual speaker in a reverberant space. Credit: Neuroscience News

The audio signals in online meetings and auditoriums that are not properly designed often have an echo lagging at least 100 milliseconds from the original speech. These echoes heavily distort speech, interfering with slowly varying sound features most important for understanding conversations, yet people still reliably understand echoic speech.

To better understand how the brain enables this, the authors used magnetoencephalography (MEG) to record neural activity while human participants listened to a story with and without an echo.

They compared the neural signals to two computational models: one simulating the brain adapting to the echo, and another simulating the brain separating the echo from the original speech.

Participants understood the story with over 95% accuracy, regardless of echo. The researchers observed that cortical activity tracks energy changes related to direct speech, despite the strong interference of the echo.

Simulating neural adaptation only partially captured the brain response they observed—neural activity was better explained by a model that split original speech and its echo into separate processing streams. This remained true even when participants were told to direct their attention toward a silent film and ignore the story, suggesting that top-down attention isn’t required to mentally separate direct speech and its echo.

The researchers state that auditory stream segregation may be important both for singling out a specific speaker in a crowded environment, and for clearly understanding an individual speaker in a reverberant space.

The authors add, “Echoes strongly distort the sound features of speech and create a challenge for automatic speech recognition. The human brain, however, can segregate speech from its echo and achieve reliable recognition of echoic speech.”

About this auditory neuroscience research news

Author: Claire Turner
Source: PLOS
Contact: Claire Turner – PLOS
Image: The image is credited to Neuroscience News

Original Research: Open access.
Original speech and its echo are segregated and separately processed in the human brain” by Nai Ding et al. PLOS Biology


Abstract

Original speech and its echo are segregated and separately processed in the human brain

Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility.

Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms.

Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed.

These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.