AI Models Form Theory-of-Mind Beliefs

Summary: Researchers showed that large language models use a small, specialized subset of parameters to perform Theory-of-Mind reasoning, despite activating their full network for every task. This sparse internal circuitry depends heavily on positional encoding, especially rotary positional encoding, which shapes how the model tracks beliefs and perspectives.

Because humans perform these social inferences with only a tiny fraction of neural resources, the discovery highlights a major inefficiency in current AI systems. The work opens a path toward future LLMs that operate more like the human brain—selective, efficient, and far less energy-intensive.

Key Facts

Sparse Circuits: LLMs rely on small internal parameter clusters for Theory-of-Mind reasoning.
Crucial Encoding: Rotary positional encoding strongly shapes how models represent beliefs and perspectives.
Efficiency Gap: Findings point toward brain-inspired designs that activate only task-relevant parameters.

Source: Stevens Institute of Technology

Imagine you’re watching a movie, in which a character puts a chocolate bar in a box, closes the box and leaves the room. Another person, also in the room, moves the bar from a box to a desk drawer. You, as an observer, know that the treat is now in the drawer, and you also know that when the first person returns, they will look for the treat in the box because they don’t know it has been moved.

You know that because as a human you have the cognitive capacity to infer and reason about the minds of other people — in this case the person’s lack of awareness regarding where the chocolate is.

In scientific terms, this ability is described as Theory of Mind (ToM). This “mind-reading” ability allows us to predict and explain the behavior of others by considering their mental states.

We develop this capacity about the age of four, and our brains are really good at it. “For a human brain it’s a very easy task,” says Zhaozhuo Xu, Assistant Professor of Computer Science at the School of Engineering — it barely takes seconds to process.

“And while doing so, our brains involve only a small subset of neurons, so it’s very energy efficient,” explains Denghui Zhang, Assistant Professor in Information Systems and Analytics at the School of Business.

Large language models or LLMs, which the researchers study, work differently. Although they were inspired by some concepts from neuroscience and cognitive science, they aren’t exact mimics of the human brain.

LLMs were built on artificial neural networks that loosely resemble the organization of biological neurons, but the models learn from patterns in massive amounts of text and operate using mathematical functions.

That gives LLMs a definitive advantage over humans in processing loads of information rapidly. But when it comes to efficiency, particularly with simple things, LLMs lose to humans. Regardless of the complexity of the task, they must activate most of their neural network to produce the answer.

So whether you’re asking an LLM to tell you what time it is or summarize Moby Dick, a whale of a novel, the LLM will engage its entire network, which is resource consuming and inefficient.

“When we, humans, evaluate a new task, we activate a very small part of our brain, but LLMs must activate pretty much all of its network to figure something new even if it’s fairly basic,” says Zhang.

“LLMs must do all the computations and then select the one thing you need. So you do a lot of redundant computations, because you compute a lot of things you don’t need. It’s very inefficient.”

Working together, Zhang and Xu formed a multidisciplinary collaboration to better understand how LLMs operate and how their efficiency in social reasoning can be improved.

They found that LLMs use a small, specialized set of internal connections to handle social reasoning. They also found that LLM’s social reasoning abilities depend strongly on how the model represents word positions, especially through a method called rotary positional encoding (RoPE).

These special connections influence how the model pays attention to different words and ideas, effectively guiding where its “focus” goes during reasoning about people’s thoughts.

“In simple terms, our results suggest that LLMs use built-in patterns for tracking positions and relationships between words to form internal “beliefs” and make social inferences,” Zhang says.

The two collaborators outlined their findings in the study titled How large language models encode theory-of-mind: a study on sparse parameter patterns, published in Nature Partner Journal on Artificial Intelligence on August 28, 2025.

Now that researchers better understand how LLM’s form their “beliefs,” they think it may be possible to make the models more efficient.

“We all know that AI is energy expensive, so if we want to make it scalable, we have to change how it operates,” says Xu.

“Our human brain is very energy efficient, so we hope this research brings us back to thinking how we can make LLMs to work more like the human brain, so that they activate only a subset of parameters in charge of a specific task. That’s an important argument we want to convey.”

Key Questions Answered:

Q: What did researchers discover about AI social reasoning?

A: Large language models rely on a small, specialized set of internal connections and positional encoding patterns to perform Theory-of-Mind reasoning.

Q: Why does this matter for AI efficiency?

A: Unlike the human brain, LLMs activate nearly their entire network for every task; understanding these sparse circuits may enable more energy-efficient AI.

Q: What’s the next goal for AI and LLMs?

A: Build LLMs that activate only task-specific parameters—more like the human brain—to reduce computation and energy costs.

About this AI and theory of mind research news

Author: Lina Zeldovich
Source: Stevens Institute of Technology
Contact: Lina Zeldovich – Stevens Institute of Technology
Image: The image is credited to Neuroscience News

Original Research: Open access.
“How large language models encode theory-of-mind: a study on sparse parameter patterns” by Zhaozhuo Xu et al. npj Artificial Intelligence

Abstract

How large language models encode theory-of-mind: a study on sparse parameter patterns

This paper investigates the emergence of Theory-of-Mind (ToM) capabilities in large language models (LLMs) from a mechanistic perspective, focusing on the role of extremely sparse parameter patterns.

We introduce a novel method to identify ToM-sensitive parameters and reveal that perturbing as little as 0.001% of these parameters significantly degrades ToM performance while also impairing contextual localization and language understanding. To understand this effect, we analyze their interactions with core architectural components of LLMs.

Our findings demonstrate that these sensitive parameters are closely linked to the positional encoding module, particularly in models using Rotary Position Embedding (RoPE), where perturbations disrupt dominant frequency activations critical for contextual processing.

Furthermore, we show that perturbing ToM-sensitive parameters affects LLMs’ attention mechanism by modulating the angle between queries and keys under positional encoding.

These insights provide a deeper understanding of how LLMs acquire social reasoning abilities, bridging AI interpretability with cognitive science.