What an AI Birdsong Decoder Tells Us About the Human Brain

Summary: Canaries are master vocalists, capable of learning and stringing together 30 to 40 distinct syllables into complex, life-long songs. Now, researchers have developed TweetyBERT, a self-supervised AI model that can automatically parse these songs with expert-level accuracy.

The tool adapts the same “transformer” architecture used in early versions of ChatGPT to identify notes, syllables, and phrases without human supervision. By automating the labor-intensive process of song annotation, TweetyBERT provides neuroscientists with a high-speed window into how the brain learns and produces complex language, with potential applications ranging from dolphin communication to tracking how climate change affects bird populations.

Key Facts

  • Self-Supervised Learning: Unlike traditional AI that requires humans to label thousands of audio clips, TweetyBERT learns to identify vocal units autonomously by predicting “masked” fragments of audio.
  • The Canary Connection: Songbirds like canaries are primary models for neuroscience because their vocal learning processes closely mirror how humans acquire and produce speech.
  • Expert Accuracy: TweetyBERT matches the precision of human expert annotators but at a fraction of the time, allowing for massive-scale longitudinal studies of bird behavior.
  • BERT Architecture: The model uses the “Bidirectional Encoder Representations from Transformers” (BERT) framework, proving that language models built for human text can be successfully adapted for complex animal acoustics.
  • Broad Applications: While built for canaries, the underlying tech is already being tested on dolphins and whales, suggesting it could become a universal “decoder” for animal communication.

Source: University of Oregon

A new machine learning model, TweetyBERT, automatically segments and classifies canary vocalizations with expert-level accuracy, offering a scalable platform for neuroscience, providing insights to the neural basis of how the brain learns and produces language, and offering potential applications of understanding animal vocalization more broadly.

The study by University of Oregon researchers appears in the scientific journal Patterns.

This shows a canary singing and sound waves.
TweetyBERT uses self-supervised machine learning to autonomously identify and categorize the complex vocal units of songbirds, providing a new tool for studying the neural foundations of language. Credit: Neuroscience News

“Current AI methods for analyzing animal vocalizations require human-labeled training data, a slow and labor-intensive process. We developed TweetyBERT, a self-supervised neural network for analyzing birdsongs. It can rapidly process unlabeled vocal recordings, identify communication units, and annotate sequences” says Tim Gardner, associate professor of bioengineering at the University of Oregon’s Knight Campus.

Neuroscientists use canaries, or songbirds, because of their remarkable ability to learn complex and lengthy songs throughout their lives, providing a window into the neural basis of complex learned behaviors.

George Vengrovski, a graduate student in Gardner’s lab, developed TweetyBERT as a means of automatically annotating the songs of canaries, which consist of 30 to 40 distinct syllables strung into sequences. He says it may change our understanding of how the brain produces speech.  

The tool adapts BERT, the language AI architecture underlying early versions of large language models like ChatGPT, to handle the unique acoustic structure of birdsong. This self-supervised transformer neural network is trained to predict masked or hidden fragments of audio without exposure to human supervision or labels, and it autonomously learns the behavioral units of song — such as notes, syllables, and phrases — performing similarly to expert annotators.

This ability to classify and annotate songs quickly, finding differences across individuals and tracking how songs change over time, can help neuroscientists uncover the neural underpinnings of how the brain learns and produces language.

Beyond neuroscience, with modification, TweetyBERT could be applied to natural bird populations, identifying changes in vocal patterns that might reveal how birds are responding to expanding human infrastructure and climate change.

“We built this for canaries, but the underlying approach isn’t species-specific, and the world is full of birds whose vocal behavior we’re barely tracking. With some modifications, the applications of TweetyBERT start to look very different,” says Gardner.

The underlying approach behind TweetyBERT is already being used for dolphins and whales, suggesting it could extend well beyond birds and deepen our understanding of animal communication more broadly.

Key Questions Answered:

Q: Can TweetyBERT actually “talk” to birds?

A: Not exactly. It doesn’t translate birdsong into English; rather, it “parses” the structure. It identifies the “grammar” and “vocabulary” of the canary’s song, allowing scientists to see exactly which neurons fire for which specific notes. It’s like mapping the alphabet of a language we don’t speak yet.

Q: Why do we use canaries to study human speech?

A: Humans and canaries are among a very small group of species that are “vocal learners.” Both have dedicated brain circuits for hearing a sound and mimicking it. By understanding how a canary’s brain organizes 40 different syllables, we can learn how our own brains manage the complexities of language and grammar.

Q: How does this help the environment?

A: Because TweetyBERT is scalable, it can be deployed in the wild to monitor thousands of hours of recordings. If birds start changing their “songs” in response to noise pollution or climate shifts, the AI will catch those subtle dialect changes immediately, acting as an early warning system for ecosystem health.

Editorial Notes:

  • This article was edited by a Neuroscience News editor.
  • Journal paper reviewed in full.
  • Additional context added by our staff.

About this AI and language research news

Author: Molly Blancett
Source: University of Oregon
Contact: Molly Blancett – University of Oregon
Image: The image is credited to Neuroscience News

Original Research: Open access.
TweetyBERT: Automated parsing of birdsong through self-supervised machine learning” by George Vengrovski, Miranda R. Hulsey-Vincent, Melissa A. Bemrose, and Timothy J. Gardner. Patterns
DOI:10.1016/j.patter.2025.101491


Abstract

TweetyBERT: Automated parsing of birdsong through self-supervised machine learning

Deep neural networks can be trained to parse animal vocalizations—serving to identify the units of communication and annotating sequences of vocalizations for subsequent statistical analysis.

However, current methods rely on human-labeled data for training. The challenge of parsing animal vocalizations in a fully unsupervised manner remains an open problem.

Addressing this challenge, we introduce TweetyBERT, a self-supervised transformer neural network developed for the analysis of birdsong.

The model is trained to predict masked or hidden fragments of audio but is not exposed to human supervision or labels.

Applied to canary song, TweetyBERT autonomously learns the behavioral units of song, such as notes, syllables, and phrases—capturing intricate acoustic and temporal patterns.

This approach of developing self-supervised models specifically tailored to animal communication may significantly accelerate the analysis of unlabeled vocal data.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.