Summary: A new auditory attention decoding device allows people to selectively hear specific people in crowded environments.
Source: Zuckerman Institute
Our brains have a remarkable knack for picking out individual voices in a noisy environment, like a crowded coffee shop or a busy city street. This is something that even the most advanced hearing aids struggle to do. But now Columbia engineers are announcing an experimental technology that mimics the brain’s natural aptitude for detecting and amplifying any one voice from many. Powered by artificial intelligence, this brain-controlled hearing aid acts as an automatic filter, monitoring wearers’ brain waves and boosting the voice they want to focus on.
Though still in the early stages of development, the technology is a significant step toward better hearing aids that would enable wearers to converse with the people around them seamlessly and efficiently. This achievement is described today in Science Advances.
“The brain area that processes sound is extraordinarily sensitive and powerful; it can amplify one voice over others, seemingly effortlessly, while today’s hearings aids still pale in comparison,” said Nima Mesgarani, PhD, a principal investigator at Columbia’s Mortimer B. Zuckerman Mind Brain Behavior Institute and the paper’s senior author. “By creating a device that harnesses the power of the brain itself, we hope our work will lead to technological improvements that enable the hundreds of millions of hearing-impaired people worldwide to communicate just as easily as their friends and family do.”
Modern hearing aids are excellent at amplifying speech while suppressing certain types of background noise, such as traffic. But they struggle to boost the volume of an individual voice over others. Scientists calls this the cocktail party problem, named after the cacophony of voices that blend together during loud parties.
“In crowded places, like parties, hearing aids tend to amplify all speakers at once,” said Dr. Mesgarani, who is also an associate professor of electrical engineering at Columbia Engineering. “This severely hinders a wearer’s ability to converse effectively, essentially isolating them from the people around them.”
The Columbia team’s brain-controlled hearing aid is different. Instead of relying solely on external sound-amplifiers, like microphones, it also monitors the listener’s own brain waves.
“Previously, we had discovered that when two people talk to each other, the brain waves of the speaker begin to resemble the brain waves of the listener,” said Dr. Mesgarani.
Using this knowledge the team combined powerful speech-separation algorithms with neural networks, complex mathematical models that imitate the brain’s natural computational abilities. They created a system that first separates out the voices of individual speakers from a group, and then compares the voices of each speaker to the brain waves of the person listening. The speaker whose voice pattern most closely matches the listener’s brain waves ¬is then amplified over the rest.
The researchers published an earlier version of this system in 2017 that, while promising, had a key limitation: It had to be pretrained to recognize specific speakers.
“If you’re in a restaurant with your family, that device would recognize and decode those voices for you,” explained Dr. Mesgarani. “But as soon as a new person, such as the waiter, arrived, the system would fail.”
Today’s advance largely solves that issue. With funding from Columbia Technology Ventures to improve their original algorithm, Dr. Mesgarani and first authors Cong Han and James O’Sullivan, PhD, again harnessed the power of deep neural networks to build a more sophisticated model that could be generalized to any potential speaker that the listener encountered.
“Our end result was a speech-separation algorithm that performed similarly to previous versions but with an important improvement,” said Dr. Mesgarani. “It could recognize and decode a voice — any voice — right off the bat.”
To test the algorithm’s effectiveness, the researchers teamed up with Ashesh Dinesh Mehta, MD, PhD, a neurosurgeon at the Northwell Health Institute for Neurology and Neurosurgery and coauthor of today’s paper. Dr. Mehta treats epilepsy patients, some of whom must undergo regular surgeries.
“These patients volunteered to listen to different speakers while we monitored their brain waves directly via electrodes implanted in the patients’ brains,” said Dr. Mesgarani. “We then applied the newly developed algorithm to that data.”
The team’s algorithm tracked the patients’ attention as they listened to different speakers that they had not previously heard. When a patient focused on one speaker, the system automatically amplified that voice. When their attention shifted to a different speaker, the volume levels changed to reflect that shift.
Credit: Columbia University’s Zuckerman Institute.
Encouraged by their results, the researchers are now investigating how to transform this prototype into a noninvasive device that can be placed externally on the scalp or around the ear. They also hope to further improve and refine the algorithm so that it can function in a broader range of environments.
This technology works by mimicking what the brain would normally do. First, the device automatically separates out multiple speakers into separate streams, and then compares each speaker with the neural data from the user’s brain. The speaker that best matches a user’s neural data is then amplified above the others Credit: Nima Mesgarani/Columbia’ Zuckerman Institute.
“So far, we’ve only tested it in an indoor environment,” said Dr. Mesgarani. “But we want to ensure that it can work just as well on a busy city street or a noisy restaurant so that wherever wearers go, they can fully experience the world and people around them.”
Funding: This research was supported by the National Institutes of Health (NIDCD-DC014279), the National Institute of Mental Health (R21MH114166), the Pew Charitable Trusts, the Pew Scholars Program in the Biomedical Sciences and Columbia Technology Ventures.
The authors report no financial or other conflicts of interest.
About this neuroscience research article
Source: Zuckerman Institute Media Contacts: Anne Holden – Zuckerman Institute Image Source: The image is adapted from the Zuckerman Institute video.
Speaker-independent auditory attention decoding without access to clean speech sources
Speech perception in crowded environments is challenging for hearing-impaired listeners. Assistive hearing devices cannot lower interfering speakers without knowing which speaker the listener is focusing on. One possible solution is auditory attention decoding in which the brainwaves of listeners are compared with sound sources to determine the attended source, which can then be amplified to facilitate hearing. In realistic situations, however, only mixed audio is available. We utilize a novel speech separation algorithm to automatically separate speakers in mixed audio, with no need for the speakers to have prior training. Our results show that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. The proposed method significantly improves the subjective and objective quality of the attended speaker. Our study addresses a major obstacle in actualization of auditory attention decoding that can assist hearing-impaired listeners and reduce listening effort for normal-hearing subjects.