When Do Kids Start Creating Original Sentences? AI Pinpoints the Age

Summary: Children begin creating novel determiner-noun combinations, like “a dog” or “the house,” at around 30 months, reveals a new study combining behavioral observations and computational modeling.

By analyzing utterances from 64 English-speaking children and training a predictive model on caregiver speech data, researchers confirmed this milestone as a key point in language development. The study shows that children go beyond mimicking input by applying linguistic rules, a crucial step in learning grammar.

Key Facts:

Children start using novel determiner-noun combinations at around 30 months.
Behavioral and computational data align to confirm this linguistic milestone.
Findings provide insights into how children generalize language rules from limited input.

Source: University of Chicago

Hearing a baby’s first words is a joyful moment for many parents. But another crucial language milestone is harder to pinpoint for both parents and scholars of human development.

When does a child start putting together words on their own, rather than parroting what they’ve heard?

A new study published last week in PNAS by researchers at the University of Chicago and others used behavioral and computational data to determine when English-speaking children go beyond their linguistic input.

This shows a child pointing to a dog. — How much linguistic input do kids have to hear to learn particular language structures? Credit: Neuroscience News

For linguists, this happens when a child uses a language rule to say something new—something they’ve never heard before.

The problem: it’s almost impossible to know everything a child has ever heard. To address this, the research team of linguists, developmental psychologists and computational analysts joined forces.

They built a generative computer model that mimicked how a child first produces a certain structure in English: determiner-noun combinations (e.g., saying a dog after having heard the dog).

“We pinpointed the moment when we thought each child can do this, and then we tried to model that with a computer,” said corresponding author Susan Goldin-Meadow, the Beardsley Ruml Distinguished Service Professor in the Departments of Psychology and Comparative Human Development at the University of Chicago. “They agreed pretty well.”

Both datasets estimated that children begin producing determiner-noun combinations they’ve never heard at around 30 months. According to Goldin-Meadow, this novel approach, combining computational modeling with behavioral observations, opens new avenues to explore long-standing questions about how children learn language.

Learning from mistakes

We all learn by making mistakes. Looking for errors is also a useful method for linguists to assess how children pick up language. When a child says, “I eated my dinner” or “I thinked about it,” it means they understand a basic grammar rule in English: verb plus -ed means something happened in the past.

Because English has irregular verbs, it’s easy to spot when a child uses this rule to produce a phrase they’ve likely never heard before.

For this study, the research team looked at a similarly characteristic part of English grammar: determiners, or words that modify nouns, like “a” and “the.” For example, a dog or the house.

Researchers assumed that if a child used both “a” and “the” for the same noun, i.e. “a pineapple” and “the pineapple,” they likely understood the pattern and were using it to create novel combinations.

For the behavioral part of the study, researchers observed 64 English-speaking children and their caregivers. For 90 minutes every four months, they recorded parents interacting with their children and compared each child’s utterances to their parent’s utterances.

Based on these samples, they determined that children started using “a” and “the” in front of the same noun around 30 months. After their first instance, researchers also noticed that the children began creating even more combinations that weren’t recorded from their caregivers.

But a sample can’t account for everything a child has heard. “The children are sitting around listening to their parents every single day, but we aren’t,” Goldin-Meadow said.

To confirm their initial estimation, the team tested something whose input was entirely known—a computer.

Model behavior

Past studies have shown that people can expect and predict the next words in a sentence. This predictive processing is what forms the basis of large-language models like ChatGPT.

For this study, researchers built a predictive model and trained it on the data collected from the parents. They fed the model in stages, simulating how a child would hear the language.

“To test the model, we give it utterances the child produced that contained a determiner, and we block out the determiner. Then the model has to predict the word that goes in the blocked-out space,” Goldin-Meadow said. “And for the most part, it does what the kid does.”

The model also confirmed the timeframe that children start to say determiner-noun combinations that go beyond what they’ve heard: around 30 months.

“For the model, we can be very sure that it has gone beyond the input it’s gotten,” Goldin-Meadow said.

Goldin-Meadow says pinpointing moments of productivity may be crucial for understanding a long-standing theoretical question in linguistics: How much linguistic input do kids have to hear to learn particular language structures?

This is an essential question for another area of Goldin-Meadow’s research: homesigners.

Homesigners are deaf children who have developed their own gestural signs to communicate. Since they haven’t had access to an established sign language like ASL, their own system of gestural language could shed light on which linguistic constructions children expect to find in the languages they are learning.

According to Goldin-Meadow, experimenting with computer modeling can test insights provided by homesigners; in this case, that homesigners are able to invent determiner-noun combinations.

“Determiner-noun constructions may be a lot easier to learn than constructions homesigners don’t invent,” Goldin-Meadow said. “And, if so, then maybe we can play around with our computational model and give it a lot less input and still have it master determiner-noun combinations.”

About this AI and language development research news

Author: Tori Lee
Source: University of Chicago
Contact: Tori Lee – University of Chicago
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Using computational modeling to validate the onset of productive determiner–noun combinations in English-learning children” by Susan Goldin-Meadow et al. PNAS

Abstract

Using computational modeling to validate the onset of productive determiner–noun combinations in English-learning children

Language is a productive system––we routinely produce well-formed utterances that we have never heard before. It is, however, difficult to assess when children first achieve linguistic productivity simply because we rarely know all the utterances a child has experienced.

The onset of linguistic productivity has been at the heart of a long-standing theoretical question in language acquisition––do children come to language learning with abstract categories that they deploy from the earliest moments of acquisition?

We address the problem of when linguistic productivity begins by marrying longitudinal behavioral observations and computational modeling to capitalize on the strengths of each.

We used behavioral data to assess when a sample of 64 English-learning children began to productively combine determiners and nouns, a linguistic construction previously used to address this theoretical question.

After the onset of productivity, the children produced determiner–noun combinations that were not attested in our sample of their linguistic input from caregivers.

We used computational techniques to model the onsets and trajectories of determiner–noun combinations in these 64 children, as well as characteristics of their utterances in which the determiner was omitted. Because we knew exactly what input the model was trained on, we could, with confidence, know that the model had gone beyond its input.

The parallels found between child and model in the timing and number of novel combinations suggest that the children too were creatively going beyond their input.