Summary: Listeners rely on a number of acoustic features to make the distinction between drum music and drum “speech”.
Source: Max Planck Institute
We are surrounded by all kinds of sounds and we are usually good at distinguishing them. For instance, when turning on the radio, we immediately notice whether music is played or someone is talking. But what happens when the speech and the music sound similar? What are the sound characteristics that help us to distinguish them?
A team of scientists from the Max Planck Institute for Empirical Aesthetics in Frankfurt, the Max Planck NYU Center for Language, Music, and Emotion (CLaME), and Arizona State University decided to investigate this question.
Music and language processing have been repeatedly compared but similarities and differences between domains are challenging to quantify. This is particularly the case when the domains overlap, as happens, for example, with rhymes or rap music. The international research team initiated an online study involving more than one hundred people from a total of 15 different native-language backgrounds in order to better understand the boundaries between these two domains.
The study focused on the “talking” Dùndún drum used in southwestern Nigeria as both a musical instrument and a medium of communication. This drum imitates the tonal language of the Yorùbá, thus creating what is known as a “speech surrogate.” Participants in the study were provided with basic knowledge about the Dùndún drum, although roughly half of them were already familiar with it.
The researchers compared the acoustic characteristics of drum speech vs. drum music in recordings of both. They also asked participants to listen to the same recordings and indicate whether they thought they were hearing speech or music.
“Most participants were able to identify a large number of the excerpts in the way they were intended by the performer—albeit with an unsurprising bias towards the music-like category. Those who were already familiar with the instrument did particularly well, but the others did better than they would have if they had just chosen the answer randomly,” explains Pauline Larrouy-Maestri of the Max Planck Institute for Empirical Aesthetics.
With the data they collected, the researchers developed a statistical model that can be used to predict when a sound sample will be perceived as music-like or speech-like. The model shows that listeners rely on a number of acoustic features to make this distinction.
Of these features, loudness, pitch, timbre, and timing were found to be significant. For example, a regular rhythm and frequent changes in timbre sound more music-like, while a decreased intensity and fewer changes in pitch make a sequence sound more like speech. Familiarity with the instrument appears to influence how a listener registers these acoustic features.
The study’s findings, recently published in the journal Frontiers in Psychology, provide empirical evidence for the relevance of acoustic features as well as insights into the role of a listener’s cultural background, thus producing new knowledge about the formation of perceptual categories in speech and music.
About this auditory neuroscience and music research news
Source: Max Planck Institute Contact: Keyvan Sarkhosh – Max Planck Institute Image: The image is credited to MPI for Empirical Aesthetics/Durojaye
Perception of Nigerian Dùndún Talking Drum Performances as Speech-Like vs. Music-Like: The Role of Familiarity and Acoustic Cues
It seems trivial to identify sound sequences as music or speech, particularly when the sequences come from different sound sources, such as an orchestra and a human voice. Can we also easily distinguish these categories when the sequence comes from the same sound source? On the basis of which acoustic features?
We investigated these questions by examining listeners’ classification of sound sequences performed by an instrument intertwining both speech and music: the dùndún talking drum. The dùndún is commonly used in south-west Nigeria as a musical instrument but is also perfectly fit for linguistic usage in what has been described as speech surrogates in Africa. One hundred seven participants from diverse geographical locations (15 different mother tongues represented) took part in an online experiment.
Fifty-one participants reported being familiar with the dùndún talking drum, 55% of those being speakers of Yorùbá. During the experiment, participants listened to 30 dùndún samples of about 7s long, performed either as music or Yorùbá speech surrogate (n = 15 each) by a professional musician, and were asked to classify each sample as music or speech-like.
The classification task revealed the ability of the listeners to identify the samples as intended by the performer, particularly when they were familiar with the dùndún, though even unfamiliar participants performed above chance. A logistic regression predicting participants’ classification of the samples from several acoustic features confirmed the perceptual relevance of intensity, pitch, timbre, and timing measures and their interaction with listener familiarity.
In all, this study provides empirical evidence supporting the discriminating role of acoustic features and the modulatory role of familiarity in teasing apart speech and music.