Improving Voice Recognition for People with Speech Disabilities

Summary: A new study shows that automatic speech recognition (ASR) systems trained on speech from people with Parkinson’s disease are 30% more accurate in transcribing similar speech patterns. Researchers collected over 151 hours of recordings from participants with varying degrees of dysarthria, a speech disorder common in Parkinson’s patients, and used the data to train ASR systems.

The study reveals that incorporating atypical speech samples significantly improves voice recognition technology for those with speech disabilities. These findings could help make voice-controlled devices more accessible to people with neuromotor disorders.

Key Facts:

  • ASR systems trained on Parkinson’s speech improved transcription accuracy by 30%.
  • The study collected 151 hours of recordings from people with dysarthria.
  • These findings could enhance accessibility for users with speech disabilities.

Source: Beckman Institute

As Mark Hasegawa-Johnson combed through data from his latest project, he was pleasantly surprised to uncover a recipe for Eggs Florentine. Sifting through hundreds of hours of recorded speech will unearth a treasure or two, he said.

Hasegawa-Johnson leads the Speech Accessibility Project, an initiative at the University of Illinois Urbana-Champaign to make voice recognition devices more useful for people with speech disabilities.

In the project’s first published study, researchers asked an automatic speech recognizer to listen to 151 hours — almost six-and-a-half days — of recordings from people with speech disabilities related to Parkinson’s disease. Their model transcribed a new dataset of similar recordings with 30% more accuracy than a control model that had not listened to people with Parkinson’s disease.

This shows a person's head.
She said the team consulted with Parkinson’s disease experts and community members to develop content relevant to participants’ lives. Credit: Neuroscience News

This study appears in the Journal of Speech, Language, and Hearing Research. The speech recordings used in the study are freely available to researchers, nonprofits and companies looking to improve their voice recognition devices.

“Our results suggest that a large database of atypical speech can significantly improve speech technology for people with disabilities,” said Hasegawa-Johnson, a professor of electrical and computer engineering at Illinois and a researcher at the university’s Beckman Institute for Advanced Science and Technology, where the project is housed.

“I look forward to seeing how other organizations will use this data to make voice recognition devices more inclusive.”

Machines like smartphones and virtual assistants use automatic speech recognition to make meaning from vocalizations, allowing people to queue up a playlist, dictate hands-free messages, seamlessly participate in virtual meetings and communicate clearly with friends and family members.

Voice recognition technology does not work well for everyone; in particular, those with neuromotor disorders like Parkinson’s disease that can cause a range of strained, slurred or discoordinated speech patterns, collectively called dysarthria.

“Unfortunately, this means that many people who need voice-controlled devices the most may encounter the most difficulty in using them well,” Hasegawa-Johnson said.

“We know from existing research that if you train an ASR on someone’s voice, it will begin to understand them more accurately. We asked: can you train an automatic speech recognizer to understand people with dysarthria from Parkinson’s by exposing it to a small group of people with similar speech patterns?”

Hasegawa-Johnson and his colleagues recruited about 250 adults with varying degrees of dysarthria related to Parkinson’s disease. Prior to joining the study, prospective participants met with a speech-language pathologist who evaluated their eligibility.

“Many people who have struggled with a communication disorder for a long time, especially a progressive one, may withdraw from daily communication,” said Clarion Mendes, a speech-language pathologist on the team. “They might share their unique thoughts, needs and ideas less and less often, thinking their communication is just too impacted to engage in meaningful conversations.

“Those are the exact people we’re looking for,” she said.

Selected participants used their personal computers and smartphones to submit voice recordings. Working at their own pace and with optional assistance from a caregiver, they repeated well-worn vocal commands like “Set an alarm,” recited passages from novels and opined on open-ended prompts like “Please explain the steps to making breakfast for four people.”

Responding to the latter, one participant enumerated the steps to make Eggs Florentine — Hollandaise sauce and all — while another pragmatically advised to order takeout.

“We’ve heard from many participants who have said that the participation process was not only enjoyable, but that it gave them the confidence to communicate with their families again,” Mendes said. “This project has brought hope, excitement and energy — uniquely human qualities — to many of our participants and their loved ones.”

She said the team consulted with Parkinson’s disease experts and community members to develop content relevant to participants’ lives. Prompts were specific and spontaneous: training a speech algorithm to recognize medication names, for example, may help an end user communicate with their pharmacy, while casual conversation-starters mimic the cadence of daily chit-chat.

“We tell participants: We know that you can make your speech clearer by putting all your effort into it, but you’re probably tired of having to try to make yourself understood for the benefit of others. Try to relax and communicate as if you’re chatting with your family on the couch,” Mendes said.

To gauge how well the speech algorithm listened and learned, the researchers divided the samples into three sets. The first set of 190 participants, or 151 recorded hours, trained the model.

As its performance improved, the researchers confirmed that the model was learning in earnest (and not just memorizing participants’ responses) by introducing it to a second, smaller set of recordings. When the model reached peak performance on the second set, the researchers challenged it with the test set.

Members of the research team manually transcribed an average of 400 recordings per participant to check the model’s work.

They found that after listening to the training set, the ASR system transcribed recordings from the test set with a word error rate of 23.69%. For comparison, a system trained on speech samples from people without Parkinson’s disease transcribed the test set with a word error rate of 36.3% — roughly 30% less accurate.

Error rates also decreased for almost all individuals in the test set. Even speakers with less typical Parkinsonian speech, like unusually fast speech or stuttering, experienced modest improvements.

“I was excited to see such a dramatic benefit,” Hasegawa-Johnson said.

He added that his enthusiasm is bolstered by participant feedback:

“I spoke with a participant who was interested in the future of this technology,” he said. “That’s the wonderful thing about this project: seeing how excited people can be about the possibility that their smart speakers and their cell phones will understand them. That’s really what we’re trying to do.”

Funding: Research described in this press release is supported by Amazon, Apple, Google, Meta and Microsoft; the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under award no. R13DC003383; and the National Science Foundation under award no. 1725729.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

About the Speech Accessibility Project

The Speech Accessibility Project is a research initiative to make voice recognition technology more useful for people with a range of diverse speech patterns and disabilities.

The project is housed within the University of Illinois Urbana-Champaign’s Beckman Institute for Advanced Science and Technology and was announced in fall 2022. Currently, the project is recruiting English-speaking U.S. and Canadian adults who have Parkinson’s disease, Down syndrome, cerebral palsy, amyotrophic lateral sclerosis and those who have had a stroke.

The project has unprecedented cross-industry support from funders Amazon, Apple, Google, Meta and Microsoft, as well as nonprofit organizations whose communities will benefit from this accessibility initiative.

As of the end of June 2024, the project has shared 235,000 speech samples with the five funding companies. 

Apply to join the Speech Accessibility Project.

Conduct research through the Speech Accessibility Project 

The Speech Accessibility Project has released approximately 170 hours of speech recordings and annotations from 211 participants with Parkinson’s disease (comprising the training and development datasets).

The project is accepting proposals for researchers, companies and nonprofits that want to use the recordings and annotations to make technology accessible to all. 

Submit a proposal to conduct research through the project.

About this AI and speech recognition research news

Author: Jenna Kurtzweil
Source: Beckman Institute
Contact: Jenna Kurtzweil – Beckman Institute
Image: The image is credited to Neuroscience News

Original Research: Open access.
Community-Supported Shared Infrastructure in Support of Speech Accessibility” by Mark Hasegawa-Johnson et al. Journal of Speech, Language and Hearing Research


Abstract

Community-Supported Shared Infrastructure in Support of Speech Accessibility

Purpose:

The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data package.

Method:

The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U.S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. For purposes of ASR experiments, the participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate.

Results:

The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson’s disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4% for typical speakers, transcribes test speech with a word error rate of 36.3%. Fine-tuning reduces the word error rate to 23.7%.

Conclusions:

Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly improve speech technology for people with disabilities. By providing these data to researchers, the SAP intends to significantly accelerate research into accessible speech technology.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.