How the Brain Distinguishes Music from Speech

Summary: A new study reveals how our brain distinguishes between music and speech using simple acoustic parameters. Researchers found that slower, steady sounds are perceived as music, while faster, irregular sounds are perceived as speech.

These insights could optimize therapeutic programs for language disorders like aphasia. The research provides a deeper understanding of auditory processing.

Key Facts:

  • Simple Parameters: The brain uses basic acoustic parameters to differentiate music from speech.
  • Therapeutic Potential: Findings could improve therapies for language disorders like aphasia.
  • Research Details: Study involved over 300 participants listening to synthesized audio clips.

Source: NYU

Music and speech are among the most frequent types of sounds we hear. But how do we identify what we think are differences between the two? 

An international team of researchers mapped out this process through a series of experiments—yielding insights that offer a potential means to optimize therapeutic programs that use music to regain the ability to speak in addressing aphasia.

This shows a brain and musical notes/
Knowing how the human brain differentiates between music and speech can potentially benefit people with auditory or language disorders such as aphasia, the authors note. Credit: Neuroscience News

This language disorder afflicts more than 1 in 300 Americans each year, including Wendy Williams and Bruce Willis.

“Although music and speech are different in many ways, ranging from pitch to timbre to sound texture, our results show that the auditory system uses strikingly simple acoustic parameters to distinguish music and speech,” explains Andrew Chang, a postdoctoral fellow in New York University’s Department of Psychology and the lead author of the paper, which appears in the journal PLOS Biology.

“Overall, slower and steady sound clips of mere noise sound more like music while the faster and irregular clips sound more like speech.”

Scientists gauge the rate of signals by precise units of measurement: Hertz (Hz). A larger number of Hz means a greater number of occurrences (or cycles) per second than a lower number. For instance, people typically walk at a pace of 1.5 to 2 steps per second, which is 1.5-2 Hz.

The beat of Stevie Wonder’s 1972 hit “Superstition” is approximately 1.6 Hz, while Anna Karina’s 1967 smash “Roller Girl” clocks in at 2 Hz. Speech, in contrast, is typically two to three times faster than that at 4-5 Hz.

It has been well documented that a song’s volume, or loudness, over time—what’s known as “amplitude modulation”—is relatively steady at 1-2 Hz. By contrast, the amplitude modulation of speech is typically 4-5 Hz, meaning its volume changes frequently. 

Despite the ubiquity and familiarity of music and speech, scientists previously lacked clear understanding of how we effortlessly and automatically identify a sound as music or speech. 

To better understand this process in their PLOS Biology study, Chang and colleagues conducted a series of four experiments in which more than 300 participants listened to a series of audio segments of synthesized music- and speech-like noise of various amplitude modulation speeds and regularity. 

The audio noise clips allowed only the detection of volume and speed. The participants were asked to judge whether these ambiguous noise clips, which they were told were noise-masked music or speech, sounded like music or speech.

Observing the pattern of  participants sorting hundreds of noise clips as either music or speech revealed how much each speed and/or regularity feature affected their judgment between music and speech. It is the auditory version of “seeing faces in the cloud,” the scientists conclude: If there’s a certain feature in the soundwave that matches listeners’ idea of how music or speech should be, even a white noise clip can sound like music or speech. 

The results showed that our auditory system uses surprisingly simple and basic acoustic parameters to distinguish music and speech: to participants, clips with slower rates (<2Hz) and more regular amplitude modulation sounded more like music, while clips with higher rates (~4Hz) and more irregular amplitude modulation sounded more like speech. 

Knowing how the human brain differentiates between music and speech can potentially benefit people with auditory or language disorders such as aphasia, the authors note.

Melodic intonation therapy, for instance, is a promising approach to train people with aphasia to sing what they want to say, using their intact “musical mechanisms” to bypass damaged speech mechanisms. Therefore, knowing what makes music and speech similar or distinct in the brain can help design more effective rehabilitation programs.

The paper’s other authors were Xiangbin Teng of Chinese University of Hong Kong, M. Florencia Assaneo of National Autonomous University of Mexico (UNAM), and David Poeppel, a professor in NYU’s Department of Psychology and managing director of the Ernst Strüngmann Institute for Neuroscience in Frankfurt, Germany.

Funding: The research was supported by a grant from the National Institute on Deafness and Other Communication Disorders, part of the National Institutes of Health (F32DC018205), and Leon Levy Scholarships in Neuroscience.

About this auditory neuroscience research news

Author: James Devitt
Source: NYU
Contact: James Devitt – NYU
Image: The image is credited to Neuroscience News

Original Research: Open access.
The human auditory system uses amplitude modulation to distinguish music from speech” by Andrew Chang et al. PLOS Biology


The human auditory system uses amplitude modulation to distinguish music from speech

Music and speech are complex and distinct auditory signals that are both foundational to the human experience. The mechanisms underpinning each domain are widely investigated.

However, what perceptual mechanism transforms a sound into music or speech and how basic acoustic information is required to distinguish between them remain open questions.

Here, we hypothesized that a sound’s amplitude modulation (AM), an essential temporal acoustic feature driving the auditory system across processing levels, is critical for distinguishing music and speech.

Specifically, in contrast to paradigms using naturalistic acoustic signals (that can be challenging to interpret), we used a noise-probing approach to untangle the auditory mechanism: If AM rate and regularity are critical for perceptually distinguishing music and speech, judging artificially noise-synthesized ambiguous audio signals should align with their AM parameters.

Across 4 experiments (N = 335), signals with a higher peak AM frequency tend to be judged as speech, lower as music. Interestingly, this principle is consistently used by all listeners for speech judgments, but only by musically sophisticated listeners for music.

In addition, signals with more regular AM are judged as music over speech, and this feature is more critical for music judgment, regardless of musical sophistication.

The data suggest that the auditory system can rely on a low-level acoustic property as basic as AM to distinguish music from speech, a simple principle that provokes both neurophysiological and evolutionary experiments and speculations.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.