AI Proves Language Evolves for Learnability

Summary: A new study identified the architectural and evolutionary principles that govern how both children and artificial neural networks absorb language. The research bridges cognitive linguistics and deep learning to demonstrate the power of “iterated learning”, the process where language reshapes itself over multiple generations to become increasingly structured and structured data becomes easier to learn.

By building a deep linear neural network modeled after a child’s progressive learning stages, investigators proved that structural regularities naturally emerge from the communication pressure and systemic errors of transmission.

Key Facts

The Iterated Evolution Paradigm: Iterated learning posits that human language is not a static construct but an evolving system that reshapes itself over successive generations to maximize structural efficiency and ease the cognitive burden of learning.
The Child-Brain Simulation: Researchers constructed a deep linear neural network engineered with structural learning traits similar to a child’s brain, exposing successive versions of the computer brain to data properties mimicking human language.
The Error-Driven Architecture: Children acquire language in structured hierarchies, occasionally making non-arbitrary mistakes due to the over-generalization of data (e.g., assuming all winged birds fly until encountering a penguin). In transmission from generation to generation, these non-arbitrary mistakes filter the data, causing highly structured, easily learnable linguistic patterns to be retained while unstructured elements are systematically forgotten.
The Depth Absolute: To map the exact neural basis of this evolutionary track, the team deployed deep linear networks. The experiments proved that iterated learning succeeds only when a network possesses sufficient depth and multiple processing layers; shallow networks with fewer layers completely fail to capture the structured regularities that make language learnable.
The Modern AI Intersection: The study proves that the structural emergence seen in massive generative AI tools is rooted in the same cognitive principles found in child development. The architecture of a learning network and the complexity of its environment dictate how effectively it can absorb and transmit language.
The Intersection of Cognition: Lead author Dr. Devon Jarvis notes that while deep linear networks and iterated learning have existed as isolated concepts in separate literatures, combining them proves that language evolves specifically to become learnable based on the staged way children process data and favor information reuse.

Source: University of Witwatersrand

New research from the University of the Witwatersrand, South Africa, has significant implications for understanding both human language development and the behaviour of large-scale artificial intelligence language models.

Culture is key, as well as an understanding of “iterated learning”, which posits that language evolves over generations (in humans and computers) to become more structured.

“We built a computer brain with similar characteristics to a child’s, and compared it to behaviours we see in children’s brains. We then fed it data with similar properties found in human language and watched how the generations (versions) of the computer brain learn.”

“It turns out, computer brains find the structure in the data in the same way that children favour certain properties of language in learning. It also showed that the dataset (language) becomes more structured over generations because it makes learning easier,” says lead author Dr Devon Jarvis, Lecturer in the School of Computer Science and Applied Mathematics (CSAM), and Fellow in the Wits Machine Intelligence and Neural Discovery (MIND) Institute.

Their findings were recently published in a paper titled: Compositionality and Systematicity Emerge from Iterated Learning in Deep Linear Networks in the prestigious journal Proceedings of the National Academy of Sciences (PNAS).

It all starts in childhood

Jarvis explains that children have a remarkable ability to rapidly learn language during early development. They learn the world in hierarchies: starting with basic concepts and gradually understanding more complex ones.

“First, they learn that plants and animals are different things. Then they learn that there are different types of animals. But at some point, there is a depth of understanding of the world that they just have not reached yet,” says Jarvis.

Take the penguin, for instance. Children learn that birds have wings and therefore can fly. AHA! But they are confused that the penguin cannot fly. Here, they over-extrapolate, and mistakes are made, which then help them to learn new information: penguins can’t fly, but they can swim, AHA!. And slowly, they built a structured understanding of the world with increasing precision.

“While this progressive acquisition of knowledge has its benefits, the work focused on the implications for generations of learners. A child learns some language from their parents, and they will eventually pass it on to their own children. Due to the complexity of language, this transmission introduces mistakes.”

“Just like the penguin example, these mistakes are not arbitrary and result from the over-generalisation of knowledge. The net result is that easy portions of language to learn are remembered and reused, while the more unstructured portions are forgotten. Essentially, individuals are good at learning but only with the pressure of communication do we really see the depth of their intelligence,” explains Jarvis.

Not all neural networks are equal

The researchers used deep linear neural networks (mathematical models that mimic how the brain processes information) to study the neural basis of this process. They found that iterated learning only works well when the network has sufficient depth, multiple layers of processing, and a sufficiently complex language. Shallow networks, those with fewer layers, failed to capture the structured regularities that make language learnable.

This suggests that the architecture of a learning system, whether biological or artificial, and the richness of its environment, play a crucial role in how well language structure can be absorbed and transmitted. A point also coming to bear in the recent advances in generative AI models, which rely heavily on scale for their emergent capabilities.

Jarvis continues: “The pieces of this work have been around in the various literatures for a while now. Deep linear networks are established models of child development and iterated learning has been known to linguists for many years.”

“But it is the combination of these two perspectives that seems to make a useful point: that language evolves to become learnable based on the very specific nature of how children learn in stages and favour reusing information over learning new things.”

“The fact that this was shown in a very simple version of the technology underpinning the modern boom in AI tools is also encouraging and suggests that in the intersection of multiple fields lies the fundamental principles of cognition.”

Key Questions Answered:

Q: Why does a child making grammatical mistakes actually help human language become easier to learn over time?

A: Because those mistakes are not random; they are predictable signs of a brain trying to find order. When a child over-generalizes a rule, like assuming a penguin flies because it has wings, they are using a structured shortcut. Over generations of parents passing speech to children, the messy, unstructured, and difficult parts of a language are forgotten, while the easy, rule-based portions are kept and reused.

Q: What is the massive structural difference between a “shallow” and a “deep” learning network when trying to master a language?

A: It comes down to processing depth and layers. The researchers at Wits found that shallow networks with very few layers are completely blind to the hidden regularities of complex language, causing them to fail at transmitting structured information. Deep networks, which mirror a child’s ability to learn the world in hierarchies, require multiple layers of depth to successfully absorb, organize, and pass down linguistic structures.

Q: How does this linguistic study of children’s brains help us better understand the explosion of modern generative AI?

A: It proves that the fundamental principles of human cognition are the exact same forces driving modern artificial intelligence. The modern boom in generative AI tools relies heavily on massive computational scale and layered depth to achieve its breakthrough capabilities. This study shows that even a very simple, deep linear version of this technology replicates the exact way human language evolves to become learnable.

Editorial Notes:

This article was edited by a Neuroscience News editor.
Journal paper reviewed in full.
Additional context added by our staff.

About this AI and language learning research news

Author: Shirona Patel
Source: University of the Witwatersrand
Contact: Shirona Patel – University of the Witwatersrand
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Compositionality and systematicity emerge from iterated learning in deep linear networks” by Devon Jarvis, Richard Klein, Benjamin Rosman, and Andrew M. Saxe. PNAS
DOI:10.1073/pnas.2509739123

Abstract

Compositionality and systematicity emerge from iterated learning in deep linear networks

Humans have a remarkable ability to systematically generalize—reasoning about new situations by combining aspects of previous experiences. Language provides one of the primary examples of this ability and modern machine learning has drawn much inspiration from linguistics.

A recent example is iterated learning, a procedure where generations of networks learn from the output of earlier learners. The result is a refinement of the network’s “language” or output labels for given inputs toward compositional structure.

Here we theoretically study the emergence of compositional language, and the ability of simple neural networks to leverage this compositionality to systematically generalize.

We build on prior theoretical work on linear networks, which mathematically define systematic generalization, by a) applying the analysis of shallow and deep linear network to the iterated learning procedure by deriving exact dynamics of learning over generations; b) refining the definition of systematicity to understand the benefits and limitations of iterated learning.

We find that iterated learning does facilitate systematic generalization over standard training paradigms by uncovering compositional substructure in the output labels.

Our results confirm a long standing conjecture: that multiple generations of iterated learning are required for compositional structure to emerge, which can outperform a single generation network trained with optimal early-stopping.

However, for the network to treat the input systematically and ignore features which do not generalize, the network must be trained on an extremely large dataset. Hence, we define “weak systematic generalization” to explain this emergent systematicity from scale.