How the brain distinguishes between voice and sound

Summary: The brain adapts to a person’s listening intentions by focusing on either a speaker’s voice or the speech sounds being made.

Source: University of Geneva

Is the brain capable of distinguishing a voice from the specific sounds it utters? In an attempt to answer this question, researchers from the University of Geneva (UNIGE), Switzerland, – in collaboration with the University of Maastricht, the Netherlands – devised pseudo-words (words without meaning) spoken by three voices with different pitches. Their aim? To observe how the brain processes this information when it focuses either on the voice or on speech sounds (i.e. phonemes). The scientists discovered that the auditory cortex amplifies different aspects of the sounds, depending on what task is being performed. Voice-specific information is prioritized for voice differentiation, while phoneme-specific information is important for the differentiation of speech sounds.

The results, which are published in the journal Nature Human Behaviour, shed light on the cerebral mechanisms involved in speech processing.

Speech has two distinguishing characteristics: the voice of the speaker and the linguistic content itself, including speech sounds. Does the brain process these two types of information in the same way? “We created 120 pseudo-words that comply with the phonology of the French language but that makes no sense, to make sure that semantic processing would not interfere with the pure perception of the phonemes,” explains Narly Golestani, a professor in the Psychology Section at UNIGE’s Faculty of Psychology and Educational Sciences (FPSE). These pseudo-words all contained phonemes such as /p/, /t/ or /k/, as in /preparation/, /gabratade/ and /ecalimacre/.

The UNIGE team recorded the voice of a female phonetician articulating the pseudo-words, which they then converted into different, lower to higher-pitched voices. “To make the differentiation of the voices as difficult as the differentiation of the speech sounds, we created the percept of three different voices from the recorded stimuli, rather than recording three actual different people,” continues Sanne Rutten, a researcher at the Psychology Section of the FPSE of the UNIGE.

How the brain distinguishes different aspects of speech

The scientists scanned their participants using functional magnetic resonance imaging (fMRI) at high magnetic field (7 Tesla). This method allows to observe brain activity by measuring the blood oxygenation in the brain: the more oxygen is needed, the more that particular area of the brain is used. While being scanned, the participants listened to the pseudo-words: in one session they had to identify the phonemes /p/,/t/ or /k/, and in another, they had to say whether the pseudo-words had been read by voice 1, 2 or 3.

The teams from Geneva and the Netherlands first analyzed the pseudo-words to better understand the main acoustic parameters underlying the differences in the voices versus the speech sounds. They examined differences in frequency (high/low), temporal modulation (how quickly the sounds change over time) and spectral modulation (how the energy is spread across different frequencies). They found that high spectral modulations best differentiated the voices, and that fast temporal modulations along with low spectral modulations best differentiated the phonemes.

The researchers subsequently used computational modeling to analyze the fMRI responses, namely the brain activation in the auditory cortex when processing the sounds during the two tasks. When the participants had to focus on the voices, the auditory cortex amplified the higher spectral modulations. For the phonemes, the cortex responded more to the fast temporal modulations and to the low spectral modulations. “The results show large similarities between the task information in the sounds themselves and the neural, fMRI data,” says Golestani.

This shows graphs from the study

On top: Analysis of main acoustic parameters underlying differences in the voices (speakers) and in the speech sounds (phonemes) in the pseudo-words themselves: high spectral modulations best differentiate the voices (blue spectral profile), and fast temporal modulations (red temporal profile) along with low spectral modulations (red spectral profile) best differentiate the speech sounds. At the bottom: Analysis of neural, fMRI data: during performance of the voice task, the auditory cortex amplifies higher spectral modulations (blue spectral profile), and during performance of the phoneme task, it amplifies fast temporal modulations (red temporal profile) and low spectral modulations (red spectral profile). These amplification profiles are highly similar to the acoustic profiles to differentiate between the voices and the phonemes. Image is credited to UNIGE.

This study shows that the auditory cortex adapts to a specific listening mode. It amplifies the acoustic aspects of the sounds that are critical for the current goal. “This is the first time that it’s been shown, in humans and using non-invasive methods, that the brain adapts to the task at hand in a manner that’s consistent with the acoustic information that is attended to in speech sounds,” points out Rutten. The study advances our understanding of the mechanisms underlying speech and speech sound processing by the brain. “This will be useful in our future research, especially on processing other levels of language – including semantics, syntax and prosody, topics that we plan to explore in the context of a National Centre of Competence in Research on the origin and future of language that we have applied for in collaboration with researchers throughout Switzerland,” concludes Golestani.

About this neuroscience research article

University of Geneva
Media Contacts:
Narly Golestani – University of Geneva
Image Source:
The image is credited to UNIGE.

Original Research: Closed access
“Cortical encoding of speech enhances task-relevant acoustic information”. Sanne Rutten, Roberta Santoro, Alexis Hervais-Adelman, Elia Formisano & Narly Golestani.
Nature Human Behavior. doi:10.1038/s41562-019-0648-9


Cortical encoding of speech enhances task-relevant acoustic information

Speech is the most important signal in our auditory environment, and the processing of speech is highly dependent on context. However, it is unknown how contextual demands influence the neural encoding of speech. Here, we examine the context dependence of auditory cortical mechanisms for speech encoding at the level of the representation of fundamental acoustic features (spectrotemporal modulations) using model-based functional magnetic resonance imaging. We found that the performance of different tasks on identical speech sounds leads to neural enhancement of the acoustic features in the stimuli that are critically relevant to task performance. These task effects were observed at the earliest stages of auditory cortical processing, in line with interactive accounts of speech processing. Our work provides important insights into the mechanisms that underlie the processing of contextually relevant acoustic information within our rich and dynamic auditory environment.

Feel free to share this Auditory Neuroscience News.
Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive the latest neuroscience headlines and summaries sent to your email daily from
We hate spam and only use your email to contact you about newsletters. We do not sell email addresses. You can cancel your subscription any time.
No more articles