Why Are Sight and Sound Out of Sync?

Summary: A new study reports speech comprehension can improve as much as 10% when sound is delayed relative to vision.

Source: City University London.

The way we process sight and sound are curiously out of sync by different amounts for different people and tasks, according to a new study from City, University of London.

When investigating the effect the researchers found that speech comprehension can sometimes actually improve by as much as 10 per cent when sound is delayed relative to vision, and that different individuals consistently have uniquely different optimal delays for different tasks.

As a result, the authors suggest that by tailoring sound delays on an individual basis via a hearing aid or cochlear implant – or a setting on a computer media player – could have significant benefits for speech comprehension and enjoyment of multimedia. The study is published in Journal of Experimental Psychology: Human Perception and Performance.

When the researchers at City looked deeper into this phenomenon, they kept finding a very curious pattern: different tasks benefitted from opposite delays, even in the same person. For example, the more an individual’s vision lags their audition in the performance of one task (e.g., identifying speech sounds), conversely the more their audition is likely to lag vision in other tasks (e.g., deciding whether lips followed or preceded the speaker’s voice). This finding provides new insight into how we determine when events actually occur in the world and the nature of perceptual timing in the brain.

When we see and hear a person speak, sensory signals travel via different pathways from our eyes and ears through the brain. The audiovisual asychronies measured in this study may occur because these sensory signals arrive at their different destinations in the brain at different times.

Yet how then do we ever know when the physical speech events actually happened in the world? The brain must have a way to solve this problem, given that we can still judge whether or not the original events are in sync with reasonable accuracy. For example, we are often able to easily identify when films have poor lip-sync.

Lead author Dr Elliot Freeman, Senior Lecturer in the Department of Psychology at City, University of London, proposes a solution based on an analogous ‘multiple clocks’ problem: “Imagine standing in an antique shop full of clocks, and you want to know what the time is. Your best guess comes from the average across clocks. However, if one clock is particularly slow, others will seem fast relative to it.

“In our new theory, which we call ‘temporal renormalisation’, the ‘clocks’ are analogous to different mechanisms in the brain which each receive sight and sound out of sync: but if one such mechanism is subject to an auditory delay, this will bias the average, relative to which other mechanisms may seem to have a visual delay. This theory explains the curious finding that different tasks show opposite delays; it may also explain how we know when events in the world are actually happening, despite our brains having many conflicting estimates of their timing.”

In their experiments, the researchers presented participants with audiovisual movies of a person speaking syllables, words or sentences, while varying the asynchrony of voice relative to lip movements. For each movie they measured their accuracy at identifying words spoken, or how strongly lip movements influenced what was heard.

In the latter case, the researchers exploited the McGurk illusion, where for example the phoneme ‘ba’ sounds like ‘da’ when mismatched with lip movements for ‘ga’. They could then estimate the asynchrony that resulted in the maximal accuracy or strongest McGurk illusion. In a separate task, they also asked participants to judge whether the voice came before or after the lip movements, from which they could estimate the subjective asynchrony.

Speaking about the study, Dr Freeman said: “We often assume that the best way to comprehend speech is to match up what we hear with lip movements, and that this works best when sight and sound are simultaneous. However, our new study confirms that sight and sound really are out of sync by different amounts in different people. We also found that for some individuals, manually delaying voices relative to lip-movements could improve speech comprehension and the accuracy of word identification by 10% or more.

a person talking into a tin can — In their experiments, the researchers presented participants with audiovisual movies of a person speaking syllables, words or sentences, while varying the asynchrony of voice relative to lip movements. For each movie they measured their accuracy at identifying words spoken, or how strongly lip movements influenced what was heard. NeuroscienceNews.com image is in the public domain.

“This paper also introduces a new automated method for assessing individual audiovisual asynchronies, which could be administered over the internet or via an ‘app’. Once an individual’s perceptual asynchrony is measured, it may be corrected artificially with a tailored delay. This could be implemented via a hearing aid or cochlear implant, or a setting on a computer media player, with potential benefits for speech comprehension and enjoyment of multimedia.

“Asynchronous perception may impact on cognitive performance, and future studies could examine its associations with schizotypal personality traits, autism spectrum traits, and dyslexia.”

About this neuroscience research article

Source: George Wigmore – City University London
Publisher: Organized by NeuroscienceNews.com.
Image Source: NeuroscienceNews.com image is in the public domain.
Original Research: Abstract for “Correlation of individual differences in audiovisual asynchrony across stimuli and tasks: New constraints on temporal renormalization theory” by Ipser, Alberta; Karlinski, Maayan; & Freeman, Elliot D. in Journal of Experimental Psychology: Human Perception and Performance. Published May 7 2018
doi:10.1037/xhp0000535

Cite This NeuroscienceNews.com Article

[cbtabs][cbtab title=”MLA”]City University London “Why Are Sight and Sound Out of Sync?.” NeuroscienceNews. NeuroscienceNews, 11 June 2018.
<https://neurosciencenews.com/sight-sound-sync-9309/>.[/cbtab][cbtab title=”APA”]City University London (2018, June 11). Why Are Sight and Sound Out of Sync?. NeuroscienceNews. Retrieved June 11, 2018 from https://neurosciencenews.com/sight-sound-sync-9309/[/cbtab][cbtab title=”Chicago”]City University London “Why Are Sight and Sound Out of Sync?.” https://neurosciencenews.com/sight-sound-sync-9309/ (accessed June 11, 2018).[/cbtab][/cbtabs]

Abstract

Correlation of individual differences in audiovisual asynchrony across stimuli and tasks: New constraints on temporal renormalization theory

Sight and sound are out of synch in different people by different amounts for different tasks. But surprisingly, different concurrent measures of perceptual asynchrony correlate negatively (Freeman et al., 2013). Thus, if vision subjectively leads audition in one individual, the same individual might show a visual lag in other measures of audiovisual integration (e.g., McGurk illusion, Stream-Bounce illusion). This curious negative correlation was first observed between explicit temporal order judgments and implicit phoneme identification tasks, performed concurrently as a dual task, using incongruent McGurk stimuli. Here we used a new set of explicit and implicit tasks and congruent stimuli, to test whether this negative correlation persists across testing sessions, and whether it might be an artifact of using specific incongruent stimuli. None of these manipulations eliminated the negative correlation between explicit and implicit measures. This supports the generalizability and validity of the phenomenon, and offers new theoretical insights into its explanation. Our previously proposed “temporal renormalization” theory assumes that the timings of sensory events registered within the brain’s different multimodal subnetworks are each perceived relative to a representation of the typical average timing of such events across the wider network. Our new data suggest that this representation is stable and generic, rather than dependent on specific stimuli or task contexts, and that it may be acquired through experience with a variety of simultaneous stimuli. Our results also add further evidence that speech comprehension may be improved in some individuals by artificially delaying voices relative to lip-movements.

Feel free to share this City University London.