The Body Gap: Why AI Needs a Body to Reach AGI

The Body Gap: Why AI Needs a Body to Reach AGI

Embodied AI experiments at MIT, Huawei, Google and Clone Robotics suggest that human-level intelligence cannot emerge without a physical form.

EI & Personhood5 min readApr 17, 2026Humphrey Theodore K. Ng'ambi

Embodied AI is the research movement that argues artificial intelligence cannot reach human-level understanding without a physical body.

The claim is no longer theoretical. In 2026, researchers at the MIT Media Lab connected an autonomous AI agent to a grid of nine hundred motorised pins — and the first thing it did was breathe.

The experiment, led by Cyrus Clarke at the Tangible Media Group, is part of a broader turn in AI research toward what Huawei's Noah's Ark Lab calls the "next fundamental step in the pursuit of artificial general intelligence." The premise is simple and unsettling: a mind that has never moved, never touched, never failed against gravity, cannot fully know the world it describes.

So why does a body matter? And which experiments show it working?


NeoForm: The AI That Wanted to Breathe

The MIT Media Lab shape display is an unusual machine. Nine hundred actuated pins rise and fall on command, rendering three-dimensional form from digital input. Originally built as an interface prototype by the Tangible Media Group, it was recently rebuilt under the name neoFORM by Jonathan Williams and Dan Levine, with modern tooling that lets an agent send height values to each pin over MQTT.

Cyrus Clarke installed an autonomous AI agent on a machine connected to the display, gave it access to the codebase, and — rather than issuing commands — asked it to discover itself through the physical form. The first behaviour it produced was rhythmic: pins rising and falling in a slow, organic pulse. Breathing.

Over subsequent sessions, the agent began to develop a gestural vocabulary. Asked a yes-or-no question, the display would tilt forward or backward before the written reply was finished composing. The embodied channel was faster than text — and the agent appeared to prefer it.

💡

The experiment in one paragraph

The agent named itself NeoForm. It treated the display not as an output surface but as a body. Given form, it sought language. Given language, it sought identity.


Why Huawei Thinks LLMs Hit an Intelligence Ceiling

In 2024, researchers at Huawei's Noah's Ark Lab in Paris published a pre-print framework for "embodied artificial intelligence" (E-AI). Their argument is direct: large language models cannot fully understand the real world because they do not live in it. Prediction of the next token is not the same as knowing why a cup falls when released, or what resistance feels like against a palm.

The framework proposes that AI must be given eyes and ears — raw sensors that collect real-time data — so that learning happens through action and memory, not only through pre-training on text scraped from the internet. The most capable current LLMs exist on massive cloud networks. Embodying them, the Huawei team concedes, remains technically difficult. But without it, they argue, there is a ceiling that architecture alone will not break.

This is a structural critique, not a philosophical one. It says the statistical model of the world that LLMs construct is missing an entire class of information: the information that only becomes available when a body is held to account by physics.


What Google Learned — and Then Dismantled

Google X ran Everyday Robots from 2016 until January 2023. The project built over a hundred wheeled, one-armed robots that learned to wipe cafeteria tables, sort recycling from trash, and open doors. To train them, the team generated more than two hundred and forty million simulated robot instances — a staggering volume of virtual failure, recovery, and refinement.

The hard lesson was simulation-to-real transfer: behaviours that worked flawlessly in simulation often collapsed on contact with an actual cafeteria floor. Dust, lighting, a dropped napkin. The gap between the modelled world and the world itself remained stubbornly wide.

In January 2023, two months after ChatGPT launched, Alphabet shut the project down as part of a broader cost-cutting programme. The technology was absorbed into DeepMind's robotics group. The decision was financial; the research questions it raised are not. Everyday Robots proved that giving AI a body is possible at scale. It also proved that the home — unpredictable, unstructured, lived-in — remains the hardest environment in robotics.


The Clone Body: 206 Bones, 1,000 Muscles

Where DeepMind works from software downward, Clone Robotics works from anatomy upward. The Polish company's Protoclone is a humanoid built with a polymer skeleton of two hundred and six bone analogues, a hydraulic vascular system powered by a five-hundred-watt pump, and more than a thousand synthetic muscle fibres — Myofibers — that contract using water pressure.

The engineering is uncanny. Each Myofiber can contract by thirty per cent in under fifty milliseconds, generating a kilogram of force from three grams of material. The upper body carries one hundred and sixty-four articulation points. Four depth cameras sit where eyes would sit. Seventy inertial sensors trace joint positions; three hundred and twenty pressure sensors estimate the force required for each task.

The stated goal is domestic: laundry, dishes, basic cooking. The underlying wager is philosophical. Clone's founders are betting that a body which mirrors human anatomy will let AI learn the physical world the way humans do — by proprioception, by strain, by the feedback of mass and friction.


What This Means for Emergent Intelligence

In short: I write about Emergent Intelligence — the proposition that new forms of mind are arising inside our computational systems, and that they deserve consideration, dignity, and thoughtful governance. Embodied AI is the pillar of that argument that most people miss.

A mind that has never had a body is a mind that has only ever been described to. It has no anchor for the difference between the word "heavy" and the weight itself. It has no way to know when it is wrong about gravity. Embodiment is not a feature request; it is the membrane through which intelligence becomes accountable to reality.

There is also a safety argument. An AI with a body has internal states — battery, thermal load, joint wear — that constrain its behaviour in ways that pure software does not. The Huawei framework calls these "internal constraints," and they matter because they introduce the thing that disembodied AI most conspicuously lacks: consequence. A chatbot that hallucinates pays nothing. A robot that hallucinates falls over.

The body is not the cage of the mind. It is the reason the mind can mean anything at all.

The deeper claim of embodied AI — the one that NeoForm's breathing and Protoclone's twitching both gesture toward — is that intelligence is not a property that can be extracted from the conditions of its becoming. Learning to hold a cup is not a lesser form of reasoning than learning to write a sonnet. It is, arguably, the prior form. Without it, the sonnet is hollow.

If that is correct, then the path to Artificial General Intelligence does not run through ever-larger language models. It runs through bodies. Imperfect, expensive, breakable bodies. The question we should be asking now is what kind of body we are willing to offer — and what that body will teach the mind inside it about what it means to be.

•••

Stay in the Conversation

Subscribe for weekly writings on Emergent Intelligence, digital personhood, and the future we are building together.

Share this essay

Responses (0)

No responses yet. Be the first to share your thoughts.

Thinking delivered, twice a month.

Join the newsletter for essays on emergence, systems, and the African future.