Can Language Models Really Understand? Study Uncovers Limits in AI Logic

Summary: A recent study questions if large language models (LLMs) truly form coherent world models, despite their accurate outputs in complex tasks like generating directions or playing games. Researchers found that while LLMs provide nearly flawless driving directions, they fail with unexpected changes, suggesting the models don’t grasp underlying rules.

The study introduces metrics to evaluate LLMs’ world models, revealing gaps in their “understanding” even when responses seem correct. These findings could be critical for applying AI in unpredictable real-world environments.

Key Facts:

Accuracy Gaps: LLMs can predict tasks like driving directions accurately but struggle with unexpected changes, revealing limited world model understanding.
New Metrics: Sequence distinction and sequence compression metrics were developed to assess LLMs’ coherent world modeling capabilities.
Structural Misunderstandings: When researchers analyzed city maps generated by LLMs, they found impossible structures, highlighting a lack of true spatial understanding.

Source: MIT

Large language models can do impressive things, like write poetry or generate viable computer programs, even though these models are trained to predict words that come next in a piece of text.

Such surprising capabilities can make it seem like the models are implicitly learning some general truths about the world.

But that isn’t necessarily the case, according to a new study. The researchers found that a popular type of generative AI model can provide turn-by-turn driving directions in New York City with near-perfect accuracy — without having formed an accurate internal map of the city.

This shows robots. — If scientists want to build LLMs that can capture accurate world models, they need to take a different approach, the researchers say. Credit: Neuroscience News

Despite the model’s uncanny ability to navigate effectively, when the researchers closed some streets and added detours, its performance plummeted.

When they dug deeper, the researchers found that the New York maps the model implicitly generated had many nonexistent streets curving between the grid and connecting far away intersections.

This could have serious implications for generative AI models deployed in the real world, since a model that seems to be performing well in one context might break down if the task or environment slightly changes.

“One hope is that, because LLMs can accomplish all these amazing things in language, maybe we could use these same tools in other parts of science, as well. But the question of whether LLMs are learning coherent world models is very important if we want to use these techniques to make new discoveries,” says senior author Ashesh Rambachan, assistant professor of economics and a principal investigator in the MIT Laboratory for Information and Decision Systems (LIDS).

Rambachan is joined on a paper about the work by lead author Keyon Vafa, a postdoc at Harvard University; Justin Y. Chen, an electrical engineering and computer science (EECS) graduate student at MIT; Jon Kleinberg, Tisch University Professor of Computer Science and Information Science at Cornell University; and Sendhil Mullainathan, an MIT professor in the departments of EECS and of Economics, and a member of LIDS. The research will be presented at the Conference on Neural Information Processing Systems.

New metrics

The researchers focused on a type of generative AI model known as a transformer, which forms the backbone of LLMs like GPT-4. Transformers are trained on a massive amount of language-based data to predict the next token in a sequence, such as the next word in a sentence.

But if scientists want to determine whether an LLM has formed an accurate model of the world, measuring the accuracy of its predictions doesn’t go far enough, the researchers say.

For example, they found that a transformer can predict valid moves in a game of Connect 4 nearly every time without understanding any of the rules.

So, the team developed two new metrics that can test a transformer’s world model. The researchers focused their evaluations on a class of problems called deterministic finite automations, or DFAs.

A DFA is a problem with a sequence of states, like intersections one must traverse to reach a destination, and a concrete way of describing the rules one must follow along the way.

They chose two problems to formulate as DFAs: navigating on streets in New York City and playing the board game Othello.

“We needed test beds where we know what the world model is. Now, we can rigorously think about what it means to recover that world model,” Vafa explains.

The first metric they developed, called sequence distinction, says a model has formed a coherent world model it if sees two different states, like two different Othello boards, and recognizes how they are different. Sequences, that is, ordered lists of data points, are what transformers use to generate outputs.

The second metric, called sequence compression, says a transformer with a coherent world model should know that two identical states, like two identical Othello boards, have the same sequence of possible next steps.

They used these metrics to test two common classes of transformers, one which is trained on data generated from randomly produced sequences and the other on data generated by following strategies.

Incoherent world models

Surprisingly, the researchers found that transformers which made choices randomly formed more accurate world models, perhaps because they saw a wider variety of potential next steps during training.

“In Othello, if you see two random computers playing rather than championship players, in theory you’d see the full set of possible moves, even the bad moves championship players wouldn’t make,” Vafa explains.

Even though the transformers generated accurate directions and valid Othello moves in nearly every instance, the two metrics revealed that only one generated a coherent world model for Othello moves, and none performed well at forming coherent world models in the wayfinding example.

The researchers demonstrated the implications of this by adding detours to the map of New York City, which caused all the navigation models to fail.

“I was surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100 percent to just 67 percent,” Vafa says.

When they recovered the city maps the models generated, they looked like an imagined New York City with hundreds of streets crisscrossing overlaid on top of the grid. The maps often contained random flyovers above other streets or multiple streets with impossible orientations.

These results show that transformers can perform surprisingly well at certain tasks without understanding the rules. If scientists want to build LLMs that can capture accurate world models, they need to take a different approach, the researchers say.

“Often, we see these models do impressive things and think they must have understood something about the world. I hope we can convince people that this is a question to think very carefully about, and we don’t have to rely on our own intuitions to answer it,” says Rambachan.

In the future, the researchers want to tackle a more diverse set of problems, such as those where some rules are only partially known. They also want to apply their evaluation metrics to real-world, scientific problems.

Funding:

This work is funded, in part, by the Harvard Data Science Initiative, a National Science Foundation Graduate Research Fellowship, a Vannevar Bush Faculty Fellowship, a Simons Collaboration grant, and a grant from the MacArthur Foundation.

About this LLM and AI research news

Author: Melanie Grados
Source: MIT
Contact: Melanie Grados – MIT
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Evaluating the World Model Implicit in a Generative Model” by Ashesh Rambachan et al. arXiv

Abstract

Evaluating the World Model Implicit in a Generative Model

Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton.

This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory. We illustrate their utility in three domains: game playing, logic puzzles, and navigation.

In all domains, the generative models we consider do well on existing diagnostics for assessing world models, but our evaluation metrics reveal their world models to be far less coherent than they appear. Such incoherence creates fragility: using a generative model to solve related but subtly different tasks can lead it to fail badly.

Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; our results suggest new ways to assess how close a given model is to that goal.