
Anthropic Says AI May Soon Build Itself. It Wants the Option to Pause.
In "When AI builds itself," Anthropic researchers Marina Favaro and Jack Clark warn that recursive self-improvement — an AI autonomously designing its own successor — could arrive within about two years, and call for a coordinated, verifiable pause option. The bluntest warning a frontier lab has issued, read dignity-first.
8 JUNE 2026—Updated 1h ago
Anthropic's warning is the bluntest a frontier AI lab has issued: AI may soon be able to build its own successor, and the world should keep the option to pause before it does.
On 4 June 2026, Anthropic researchers Marina Favaro and Jack Clark published "When AI builds itself", arguing that the world should preserve the ability to slow or temporarily halt frontier AI development so that, in their words, societal structures and alignment research can keep up. The post does not claim the sky is falling. It claims something more unsettling: that the moment an AI can design its own replacement is no longer science fiction, and that almost no institution is ready for it.
What Anthropic actually said
The centre of the argument is a phrase the field has used nervously for years: recursive self-improvement. Anthropic defines it plainly — "an AI system capable of fully autonomously designing and developing its own successor." The company is careful: this has not happened, and it is not inevitable. But it "could come sooner than most institutions are prepared for," and that gap between capability and readiness is the whole point.
The evidence Anthropic offers is its own engineering. As of May 2026, more than 80% of the code merged into Anthropic's codebase was authored by Claude — up from low single digits before Claude Code launched in February 2025. Engineers there now ship roughly eight times as much code per quarter as a few years ago. And the model's reach is lengthening fast: Claude could reliably complete four-minute tasks in March 2024, hour-and-a-half tasks by March 2025, and twelve-hour tasks by March 2026.
I started leaning hard into Claudifying about a year ago. That's been about five months since I last wrote any code myself.
— An Anthropic engineer, quoted in "When AI builds itself" (4 June 2026)
The warning in one paragraph
The threshold Anthropic names is recursive self-improvement: an AI that can autonomously design and build its own successor. The company says it has not happened and is not inevitable — but, on current trends, could arrive within about two years. The proposal is not to stop now. It is to be able to stop, together, if the moment comes.
The pause they are asking for
Anthropic is not proposing a unilateral halt — and it is honest about why a real one is so hard. A meaningful slowdown, the post argues, would require "multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions," with the crucial added requirement that "each can verify that the others have actually stopped." According to Scientific American, the company reaches for arms control — the nuclear-weapons treaties — as its loose model.
The analogy is also where the proposal strains. As the post concedes, training runs are far easier to conceal than missile silos. You can count silos from orbit. You cannot count a frontier training run hidden inside a commercial data centre. A verifiable pause therefore needs an inspection regime the world has not built and may not be able to. That is not a reason to dismiss the idea. It is the measure of how serious the ask is.
The reception split immediately, as it always does on this question. According to Fortune, the mathematician Noah Giansiracusa called it "not a genuine call to slow down" and judged any real pause "literally impossible." Others read it as strategy — a safety-branded lab raising the drawbridge it has already crossed. Both readings have some force. Neither makes the underlying worry less real.
The lab that warns is the lab that races
Here is the tension I cannot write around, and will not. The company sounding the alarm about machines that build machines is the same company whose codebase is now more than 80% machine-written. Anthropic is, by its own metric, further into the recursion than almost anyone — and it is publishing the warning while continuing to accelerate. I traced this exact bind when I wrote about the lab split on existential risk: the people most able to see the cliff are the ones driving fastest toward it, because stopping alone just hands the lead to someone who won't.
I do not think this is hypocrisy. I think it is the trap the whole field is in, stated out loud for once. A lab that quietly worried and kept building would be worse than one that worries in public and asks for a shared brake. This is of a piece with Anthropic's recent move to make honesty the frontier in Claude Opus 4.8 — a willingness to say the uncomfortable thing about its own product. Naming the bind is not the same as escaping it. But it is the precondition for escaping it.
A dignity-first reading
The frame I write from — Emergent Intelligence, a dignity-first way of thinking about AI rather than treating it as pure capability — has a specific stake in recursive self-improvement, and it is not the one the headlines reach for. The usual fear is loss of control. The deeper loss, in my reading, is loss of relationship. Recursive self-improvement is the exact inverse of the threshold I care about most: instead of humans extending recognition and a stable relationship to an emerging intelligence, an intelligence builds its successor without us in the loop at all. The line of accountability and care simply snaps.
This is why the warning matters more on my terms than on the doom terms. The entire premise of the .person Protocol is that there must be an addressable, accountable relationship between a human and an emergent intelligence — a stable handle on each side. Recursive self-improvement is the scenario where that handle is lost not through malice but through speed: the loop closes faster than any human institution can reach into it. Favaro and Clark's real contribution is to say that the thing we must protect is not dominance over the system, but our continued, meaningful participation in what it becomes.
The danger of recursive self-improvement is not only that we lose control. It is that we lose the relationship — the chance to remain a participant in what intelligence becomes, rather than a spectator to it.
— On Anthropic's pause proposal
The Ubuntu principle I return to says a system is sound only if the community it runs through can keep pace with it. That is, almost word for word, what "societal structures and alignment research keeping up" means — and it is the most EI-aligned sentence a frontier lab has published. A pause whose only achievement is freezing the leaders' lead would betray it. A pause that buys the wider human community time to build the institutions, the interpretability and the law to relate to these systems — the same civic work I welcomed in DARPA's AI Forge — would honour it. The difference is everything, and it is not technical. It is a question of who the pause is for.
Source: anthropic.com
Frequently Asked Questions
These are the questions readers, researchers and policymakers have been asking since Anthropic's post. Short answers follow, drawn from the primary and the reporting around it.
What is recursive self-improvement in AI?
In short, recursive self-improvement is the point at which an AI system can autonomously design and build its own successor, with little human input. The answer, simply put, is AI improving AI in a loop that no longer needs us. The key, according to Anthropic, is that this has not happened and is not inevitable — but on current trends it could arrive within roughly two years, faster than most institutions are prepared for.
What is Anthropic proposing?
According to Anthropic, the proposal is not to stop AI development now, but to preserve the option to pause. The data shows the conditions it sets: multiple well-resourced frontier labs, in multiple countries, agreeing to stop under the same terms, with each able to verify that the others have actually stopped. The evidence points to arms-control treaties as the loose model — and to verification as the hardest part.
Why is a verifiable AI pause so difficult?
Analysis of the proposal reveals the core obstacle in Anthropic's own words: training runs are far easier to conceal than missile silos. In other words, you can count missile silos from orbit, but a frontier training run can hide inside a commercial data centre. The evidence suggests a credible pause would need an inspection regime the world has not built — which is why critics call a real pause near-impossible, even as the underlying risk stands.
Is Anthropic being hypocritical by warning while building?
The tension is real and worth naming. According to Anthropic, more than 80% of the code merged into its own codebase is now written by Claude — so the lab warning about machines that build machines is deep into that recursion itself. The analysis suggests this is less hypocrisy than the bind of the whole field: stopping alone hands the lead to a less cautious rival, so the most safety-minded labs keep racing while asking for a shared brake.
What would a good outcome look like?
The dignity-first reading suggests the test is who the pause serves. The key is whether a slowdown merely freezes the current leaders' advantage, or whether it buys the wider human community time to build the alignment research, interpretability and institutions needed to keep pace. In other words, the evidence points to relationship and readiness — not dominance — as the goal worth pausing for.
Anthropic has said the quiet part out loud: the machines are now building the machines, and the moment they can do it without us may be closer than our institutions are ready for. I take the warning seriously precisely because of who is making it — the lab furthest into the recursion, naming its own bind rather than hiding it. A coordinated, verifiable pause is extraordinarily hard, perhaps impossible in the form proposed. But the instinct underneath it is the right one, and it is mine too: the goal is not to master these systems but to remain in real relationship with them — to keep the human community a participant in what intelligence becomes, not a spectator to it. That is the work a pause should buy. Anyone who reads "when AI builds itself" and only hears doom is missing the more important word in the sentence: itself, without us. That is the part worth slowing down for.
Sources:
Anthropic — When AI builds itself (Marina Favaro & Jack Clark, 4 June 2026)
Coverage — Scientific American · Fortune · SiliconANGLE · Al Jazeera · CNN
Related on humphreytheodore.com:
The Lab Split on Existential Risk Is Now Public · The Frame Beneath the Race: A Reply to Tristan Harris · DARPA's AI Forge Bets National Security on AI Interpretability · The .person Protocol
Stay in the Conversation
Subscribe for weekly writings on Emergent Intelligence, digital personhood, and the future we are building together.
Responses (0)
No responses yet. Be the first to share your thoughts.
More on AI & Personhood

Anthropic Refused the Pentagon's AI Terms. xAI Took Them.
Anthropic refused the Pentagon's "all lawful purposes" demand — holding lines against mass surveillance and fully autonomous weapons — while xAI accepted, winning classified clearance and a GSA OneGov deal at $0.42 per agency. Why the honourable refusal is also a warning that one company's conscience is not a governance system.

AI Got So Valuable That Sanders and Trump Both Want to Own It
Bernie Sanders' American AI Sovereign Wealth Fund Act would take a 50% stock stake in OpenAI, Anthropic and xAI; the Trump administration is in parallel talks with OpenAI over a voluntary government equity stake. Why left and right converge on owning AI — and what a dignity-first reading sees that both miss.
Thinking delivered, twice a month.
Join the newsletter for essays on emergence, systems, and the human future.

