AI Body Gap: Why Robots Need "Internal Feelings" to be Safe

Summary: When you reach for a saltshaker, your brain isn’t just calculating coordinates; it’s listening to your body’s sense of balance, the friction on your skin, and your internal level of thirst or fatigue. A provocative new study argues that current AI models like ChatGPT and Gemini are fundamentally flawed because they lack “internal embodiment.”

While AI can describe a glass of water perfectly, it has no internal state of “thirst” to regulate its behavior. Researchers argue that without these internal “vulnerabilities” and self-regulators, AI will remain prone to overconfident errors and struggle to truly align with human values.

Key Facts

The Missing Ingredient: The study distinguishes between External Embodiment (interacting with the physical world) and Internal Embodiment (the constant monitoring of internal states like fatigue, uncertainty, or need).
The Perceptual Test: Researchers tested leading AI models using “point-light displays” (dots that suggest a human figure). While even human newborns recognize the person, AI models failed, sometimes describing the dots as a “constellation of stars.”
Safety via Vulnerability: In humans, the body acts as a built-in safety system. If we are “depleted” or “uncertain,” our body registers it. AI lacks this “internal cost,” meaning it has no intrinsic reason to avoid being overconfident when it’s actually guessing.
The Dual-Embodiment Framework: UCLA researchers propose a new architecture for AI that tracks “synthetic” internal states—such as processing load and confidence levels—to constrain behavior over time.
Moving Beyond Mimicry: The team argues for new benchmarks that measure if an AI can monitor itself and maintain stability, rather than just testing if it can identify objects or pass a bar exam.

Source: UCLA

When a person reaches across a table to pass the salt, their brain is doing something far more complex than recognizing a request and executing a movement. It is drawing on a lifetime of bodily experience — where their hand is in space, what a saltshaker feels like, the social awareness of who asked and why. In a fraction of a second, their body and brain are working as one.

Today’s most advanced artificial intelligence systems lack such bodily mechanisms and a new study by UCLA Health argues that this has significant implications for how these models behave as well as how safe and trustworthy they can become.

This shows the outline of a robot. — Researchers argue that “internal embodiment” is the next great frontier in creating trustworthy and human-aligned artificial intelligence. Credit: Neuroscience News

In a paper published in the journal Neuron, UCLA Health postdoctoral fellow Akila Kadambi and colleagues propose that current AI systems are missing two essential ingredients that humans take for granted: a body that interacts with the physical world and an internal awareness of that body’s own states such as fatigue, uncertainty or physiological need.

The researchers call this combined property “internal embodiment,” and propose that building functional analogues of it into AI represents one of the most crucial and underexplored frontiers in the field.

“While there is a current focus in world modeling on external embodiment, such as our outward interactions with the world, far less attention is given to internal dynamics, or what we term ‘internal embodiment’. In humans, the body acts as our experiential regulator of the world, as a kind of built-in safety system,” said Akila Kadambi, a postdoctoral fellow in the Department of Psychiatry and Biobehavioral Sciences at UCLA’s David Geffen School of Medicine and the paper’s first author.

“If you’re uncertain, if you’re depleted, if something conflicts with your survival, your body registers that. AI systems right now have no equivalent. They can sound experiential, whether they should be or not, and that’s a real problem for many reasons, especially when these systems are being deployed in consequential settings.”

The AI body gap

The paper focuses on multimodal large language models, which is the class of technology that powers tools such as ChatGPT and Google’s Gemini. While these systems can process and generate text, images and video to describe a cup of water, for example, they cannot know what it feels like to be thirsty, the authors state.

That distinction is not only philosophical, the authors state, but also has measurable consequences for how these systems perform and behave. In one illustration from the paper, researchers showed several leading AI models a simple image: a small number of dots arranged to suggest a human figure in motion, which is a well-established perceptual test known as a point-light display that even newborns can recognize as human.

Several models failed to identify the figure as a person, with one describing it instead as a constellation of stars. When the same image was rotated just 20 degrees, even the best-performing models broke down.

Humans don’t fail this test because human perception is anchored to a lifetime of bodily experience that they have moving as acting agents in the world. AI systems, trained on vast libraries of text and images but with no bodily experience, are pattern-matching without that anchor, the study authors state.

Two kinds of ‘embodiment’

The paper draws a distinction that has not previously been made explicit in AI research. It defines “external embodiment” as a system’s ability to interact with the physical world, to perceive its environment, plan actions and respond to real-world feedback, which is an important focus in current multimodal AI models. Internal embodiment, however, has not been implemented in these models. The authors define this as the continuous monitoring of one’s own internal states, the biological equivalent of knowing you are tired, uncertain or in need.

Humans regulate these internal states constantly and automatically using the body’s organs, hormones and nervous system. Humans use that information not just to maintain physical health, but to shape attention, memory, emotion and social behavior.

“By contrast, current AI systems have no equivalent mechanism. They process inputs and generate outputs without any persistent internal state that regulates how they behave over time,” said Dr. Marco Iacoboni, professor in the Department of Psychiatry and Biobehavioral Sciences at the David Geffen School of Medicine and a senior author on the paper.

“This is not just a performance limitation, but also a safety limitation. Without internal costs or constraints, an AI system has no intrinsic reason to avoid overconfident errors, resist manipulation or behave consistently.”

What comes next

The authors state the paper is meant to guide future research as AI technology develops. The authors propose what they call a “dual-embodiment framework,” or a set of principles for building AI systems that model both their interactions with the external world and their own internal states.

These internal state variables would not need to replicate human biology directly but would function as persistent signals tracking things like uncertainty, processing load and confidence that could shape the system’s outputs and constrain its behavior over time.

The authors also propose a new class of tests, or benchmarks, designed to measure a system’s internal embodiment. Existing AI benchmarks focus almost exclusively on external performance such as ifthe system can navigate a space, identify an object complete a task.

The UCLA researchers argue the field needs evaluations that probe whether a system can monitor its own internal states, maintain stability when those states are disrupted and behave pro-socially in ways that emerge from shared internal representations rather than statistical mimicry.

“What this work does is bring that insight directly to bear on AI development,” Iacoboni said. “If we want AI systems that are genuinely aligned with human behavior — not just superficially fluent — we may need to give them vulnerabilities and checks that function like internal self-regulators.”

Key Questions Answered:

Q: Why does an AI need to feel “thirsty” to tell me where the nearest water fountain is?

A: It’s about the anchor of experience. Because you know what thirst feels like, your brain prioritizes water-seeking behavior in a way that is consistent and survival-oriented. For an AI, “water” is just a statistical token. Without an internal state to regulate its “desire” or “urgency,” its advice can be inconsistent or dangerously overconfident because it doesn’t “care” about the outcome.

Q: What was the “Point-Light” test, and why did the AI fail it?

A: A point-light display is just a few dots moving like a human walking. Humans see the “person” immediately because we have spent our lives moving our own bodies. AI models, trained only on static images and text, lack that “bodily anchor.” They see the dots as math, not as a reflection of a physical being, which is why rotating the image by just 20 degrees made the models break down completely.

Q: Are we trying to give AI “feelings” or just “feedback loops”?

A: The researchers call them “functional analogues.” They don’t need to feel “sad,” but they do need a persistent internal signal that says, “I am currently at 90% processing capacity and my confidence in this answer is low.” In humans, those signals prevent us from making reckless decisions; in AI, they could serve as the ultimate “kill switch” for misinformation.

Editorial Notes:

This article was edited by a Neuroscience News editor.
Journal paper reviewed in full.
Additional context added by our staff.

About this AI and neurotech research news

Author: Will Houston
Source: UCLA
Contact: Will Houston – UCLA
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Embodiment in multimodal large language models” by Akila Kadambi, Lisa Aziz-Zadeh, Antonio Damasio, Marco Iacoboni, and Srini Narayanan. Neuron
DOI:10.1016/j.neuron.2026.03.004

Abstract

Embodiment in multimodal large language models

Multimodal large language models (MLLMs) have demonstrated an extraordinary capacity to bridge textual and visual inputs. Nonetheless, MLLMs still face limitations in situated physical and social interactions in sensorially rich and multimodal real-world settings, where the embodied experience of a living organism appears fundamental.

We suggest that the next frontiers for MLLM development require the incorporation of both internal and external embodiment—modeling not only external interactions with the world but also internal states and drives.

Here, we describe mechanisms of internal and external embodiment in humans and relate these to current advances in MLLMs in the early stages of aligning to human representations.

Our dual-embodied framework proposes to model interactions between these forms of embodiment in MLLMs so as to bridge the gap between multimodal data and world experience.