Listen and Learn: AI Systems Process Speech Signals Like Human Brains

Summary: Artificial intelligence (AI) systems can process signals similar to how the brain interprets speech, potentially helping to explain how AI systems operate. Scientists used electrodes on participants’ heads to measure brain waves while they listened to a single syllable and compared that brain activity to an AI system trained to learn English, finding that the shapes were remarkably similar, which could aid in the development of increasingly powerful systems.

Key Facts:

Researchers found that the signals produced by an AI system trained to learn English were remarkably similar to brain waves measured as participants listened to a single syllable, “bah,” in a study published recently in the journal Scientific Reports.
The team used a system of electrodes placed on participants’ heads to measure brain waves as they listened to the sound, then compared the brain activity to the signals produced by an AI system.
Understanding how and why AI systems provide the information they do is becoming essential as they become ingrained in daily life in fields ranging from healthcare to education.
Studying the waves in their raw form will help researchers understand and improve how these systems learn and increasingly come to mirror human cognition.

Source: UC Berkeley

New research from the University of California, Berkeley, shows that artificial intelligence (AI) systems can process signals in a way that is remarkably similar to how the brain interprets speech, a finding scientists say might help explain the black box of how AI systems operate.

Using a system of electrodes placed on participants’ heads, scientists with the Berkeley Speech and Computation Lab measured brain waves as participants listened to a single syllable — “bah.” They then compared that brain activity to the signals produced by an AI system trained to learn English.

“The shapes are remarkably similar,” said Gasper Begus, assistant professor of linguistics at UC Berkeley and lead author on the study published recently in the journal Scientific Reports. “That tells you similar things get encoded, that processing is similar. “

A side-by-side comparison graph of the two signals shows that similarity strikingly.

“There are no tweaks to the data,” Begus added. “This is raw.”

AI systems have recently advanced by leaps and bounds. Since ChatGPT ricocheted around the world last year, these tools have been forecast to upend sectors of society and revolutionize how millions of people work. But despite these impressive advances, scientists have had a limited understanding of how exactly the tools they created operate between input and output.

A question and answer in ChatGPT has been the benchmark to measure an AI system’s intelligence and biases. But what happens between those steps has been something of a black box. Knowing how and why these systems provide the information they do — how they learn — becomes essential as they become ingrained in daily life in fields spanning health care to education.

Begus and his co-authors, Alan Zhou of Johns Hopkins University and T. Christina Zhao of the University of Washington, are among a cadre of scientists working to crack open that box.

To do so, Begus turned to his training in linguistics.

When we listen to spoken words, Begus said, the sound enters our ears and is converted into electrical signals. Those signals then travel through the brainstem and to the outer parts of our brain.

With the electrode experiment, researchers traced that path in response to 3,000 repetitions of a single sound and found that the brain waves for speech closely followed the actual sounds of language.

The researchers transmitted the same recording of the “bah” sound through an unsupervised neural network — an AI system — that could interpret sound. Using a technique developed in the Berkeley Speech and Computation Lab, they measured the coinciding waves and documented them as they occurred.

This shows a brain and sound waves — The researchers transmitted the same recording of the “bah” sound through an unsupervised neural network — an AI system — that could interpret sound. Credit: Neuroscience News

Previous research required extra steps to compare waves from the brain and machines. Studying the waves in their raw form will help researchers understand and improve how these systems learn and increasingly come to mirror human cognition, Begus said.

“I’m really interested as a scientist in the interpretability of these models,” Begus said. “They are so powerful. Everyone is talking about them. And everyone is using them. But much less is being done to try to understand them.”

Begus believes that what happens between input and output doesn’t have to remain a black box. Understanding how those signals compare to the brain activity of human beings is an important benchmark in the race to build increasingly powerful systems. So is knowing what’s going on under the hood.

For example, having that understanding could help put guardrails on increasingly powerful AI models. It could also improve our understanding of how errors and bias are baked into the learning processes.

Begus said he and his colleagues are collaborating with other researchers using brain imaging techniques to measure how these signals might compare. They’re also studying how other languages, like Mandarin, are decoded in the brain differently and what that might indicate about knowledge.

Many models are trained on visual cues, like colors or written text — both of which have thousands of variations at the granular level. Language, however, opens the door for a more solid understanding, Begus said.

The English language, for example, has just a few dozen sounds.

“If you want to understand these models, you have to start with simple things. And speech is way easier to understand,” Begus said. “I am very hopeful that speech is the thing that will help us understand how these models are learning.”

In cognitive science, one of the primary goals is to build mathematical models that resemble humans as closely as possible. The newly documented similarities in brain waves and AI waves are a benchmark on how close researchers are to meeting that goal.

“I’m not saying that we need to build things like humans,” Begus said. “I’m not saying that we don’t. But understanding how different architectures are similar or different from humans is important.”

About this artificial intelligence research news

Author: Jason Pohl
Source: UC Berkeley
Contact: Jason Pohl – UC Berkeley
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Encoding of speech in convolutional layers and the brain stem based on language experience” by Gasper Begus et al. Scientific Reports

Abstract

Encoding of speech in convolutional layers and the brain stem based on language experience

Comparing artificial neural networks with outputs of neuroimaging techniques has recently seen substantial advances in (computer) vision and text-based language models. Here, we propose a framework to compare biological and artificial neural computations of spoken language representations and propose several new challenges to this paradigm.

The proposed technique is based on a similar principle that underlies electroencephalography (EEG): averaging of neural (artificial or biological) activity across neurons in the time domain, and allows to compare encoding of any acoustic property in the brain and in intermediate convolutional layers of an artificial neural network.

Our approach allows a direct comparison of responses to a phonetic property in the brain and in deep neural networks that requires no linear transformations between the signals. We argue that the brain stem response (cABR) and the response in intermediate convolutional layers to the exact same stimulus are highly similar without applying any transformations, and we quantify this observation.

The proposed technique not only reveals similarities, but also allows for analysis of the encoding of actual acoustic properties in the two signals: we compare peak latency (i) in cABR relative to the stimulus in the brain stem and in (ii) intermediate convolutional layers relative to the input/output in deep convolutional networks.

We also examine and compare the effect of prior language exposure on the peak latency in cABR and in intermediate convolutional layers. Substantial similarities in peak latency encoding between the human brain and intermediate convolutional networks emerge based on results from eight trained networks (including a replication experiment).

The proposed technique can be used to compare encoding between the human brain and intermediate convolutional layers for any acoustic property and for other neuroimaging techniques.