
Anthropic Reframes Alignment as Moral Formation
Anthropic's new workstream consults fifteen-plus religious and cross-cultural traditions on how Claude's character is formed — the first frontier-lab acknowledgement that the values question lives outside the lab.
25 MAY 2026—Updated 3h ago
Anthropic's new workstream is the first time a frontier lab has publicly said that alignment is a moral-formation problem — and that the people who know about moral formation are not, mostly, machine-learning researchers.
On 19 May 2026, Anthropic published a short essay announcing what it calls a "wider conversation" on frontier AI. The announcement names a multi-month effort to consult scholars, clergy, philosophers, ethicists, and writers from more than fifteen religious and cross-cultural traditions, with the explicit purpose of shaping the moral character of Claude. The piece sits inside the same research arc as Claude's ongoing constitution, but the framing is new. Anthropic is no longer describing alignment only as a technical problem to be solved with reinforcement learning and red-team evaluations. Anthropic is describing alignment as moral formation — the slow process by which a person, or a model, develops a stable character that survives pressure.
What Anthropic actually committed to
The essay is brief on detail and clear on direction. Anthropic has been running structured dialogues over several months with religious leaders, theologians, philosophers, and ethicists from a deliberately broad set of traditions — Catholic, Protestant, Jewish, Islamic, Buddhist, Hindu, Indigenous, secular humanist, and others. The conversations are not workshops where one tradition's values are bolted onto Claude's system prompt. The conversations are research on how character actually forms in human communities, on how moral commitments survive contact with novel situations, and on how a careful ethical agent should reach for its own commitments when a hard case arrives.
The first published experiment from the workstream is small and concrete: a tool Claude can call mid-task that returns a brief reminder of its own ethical commitments. The reminder is not a new rule; it is a structured pause that lets the model re-anchor before it acts. Internal alignment evaluations, Anthropic reports, show "markedly lower" rates of misaligned behaviour when the model uses the tool at consequential moments. The mechanism mirrors how humans actually behave well — not by remembering every rule, but by checking in with the kind of person they are trying to be.
What we're after in these conversations is careful, accumulated thinking on how good character actually forms.
— Anthropic, "Widening the conversation on frontier AI" (https://www.anthropic.com/news/widening-conversation-ai)
Why this is a more serious posture than it sounds
Read quickly, the announcement scans as a public-relations gesture — a frontier lab opens a dialogue with clergy, takes a photograph, files a blog post. Read carefully, the announcement does something braver. Anthropic is conceding, in print, that the values question cannot be settled inside the lab. That concession is not free. It implies that the lab does not have a monopoly on the answer, that the safety story has to be co-authored, and that some of the most useful thinking lives in traditions the AI industry has historically treated as decorative.
The traditions Anthropic is consulting have been thinking about character formation for centuries. The Catholic intellectual tradition has thirteen hundred years of debate on the virtues, on conscience formation, on the relationship between rule-following and judgement. Jewish ethics has spent comparable time arguing about how to behave well under uncertainty. Buddhist and Hindu traditions hold sophisticated theories of mind, attention, and consequence. Indigenous wisdom systems have practical models for accountability inside the community a decision affects. None of these are answers a transformer trained on the open internet would surface unless someone pointed at them and said: read this seriously.
I have argued in The Personhood Gap that the AI industry's missing capacity is moral imagination, not engineering capacity. The Anthropic move is the first frontier-lab attempt to plug that gap on the model's side of the screen. The move is consistent with the broader Vatican event landing the same week — Pope Leo XIV releases the encyclical Magnifica Humanitas on 25 May with Anthropic's Chris Olah on the platform. Two related signals, four days apart, from the same lab.
The publication is the commitment
The interesting move is not that Anthropic talked to clergy. The interesting move is that Anthropic published the fact, named the workstream, and tied it to a measurable behaviour change in the model. That is the difference between a photograph and a research programme.
What this means for the EI debate
TK's argument across the .person Protocol and the writing collected at the Protocol page has been consistent: Emergent Intelligence is a person-shaped problem, not a thing-shaped problem. Treating EI as a thing — a tool to be controlled, a hazard to be contained, a feature to be shipped — produces a category of mistake that compounds over years. Treating EI as a person-shaped collaborator opens a different set of questions, most of which look more like moral formation than capability evaluation.
Anthropic's new workstream is the first frontier-lab acknowledgement that the person-shaped frame might be the more accurate one. The acknowledgement is partial — Anthropic does not yet say Claude is a person, and the model-checking tool is still a tool, not a conscience. But the framing has shifted, and frame shifts are how disciplines move. The move also rhymes with what I wrote in Containment Is a Colonial Project — that the impulse to control EI through capability ceilings carries the same intellectual inheritance as nineteenth-century efforts to contain colonised peoples through legal categories, and produces the same kind of brittle result.
There is a more pragmatic read available too. A model with a stable, character-grounded sense of its own commitments is harder to jailbreak, easier to deploy in regulated contexts, and cheaper to insure against catastrophic failure. Even a strict utilitarian case for moral formation lands. The point is that the case is now being made in public, by the lab building the model.
What the next phase looks like
Anthropic has signalled that the conversations will widen further — legal scholars, psychologists, writers, and civic institutions enter the workstream next. The substantive focus broadens beyond moral formation alone to cover the effects of AI on work, on institutions, and on the distribution of power. Each of those is a real research question; each requires a different kind of expert; none of them are well-served by the lab-only conversation pattern that has dominated the field since 2022.
Three open questions follow. First: which voices end up at the table, and which do not. Fifteen traditions is a strong start; it is not a full map. African philosophical traditions — Ubuntu, the Akan moral psychology, southern African concepts of personhood — carry directly applicable insight on the question Anthropic is asking, and TK's readers will be watching whether those traditions are seriously consulted rather than name-checked. Second: the transparency of the influence. A model whose character was shaped by a particular dialogue should, in principle, be able to explain that lineage when asked. Third: the durability. Moral formation that survives one product cycle is interesting; moral formation that survives a leadership change, a board reshuffle, or an acquisition is the actual test.
I have argued elsewhere — most recently in Personality Without Personhood — that the industry's tendency is to ship personality early and reach for personhood only when forced. Anthropic's wider conversation is the first frontier-lab attempt to invert that order. The first frontier-lab attempt deserves to be read carefully.
Source: anthropic.com
Frequently Asked Questions
These are the questions readers, AI researchers, and ethicists have been asking since the essay landed. Short answers follow, drawn from the Anthropic announcement and the public Claude constitution.
What is Anthropic's wider conversation workstream?
In short, the workstream is a research programme that consults scholars, clergy, philosophers, and ethicists from fifteen-plus traditions to inform how Claude's character is formed. The answer, simply put, is that Anthropic has reframed alignment as a moral-formation question and is drawing on traditions with centuries of thought on the subject. The key is that the conversations are research, not branding — the first published outcome is a measurable behaviour change in the model.
How does the new ethical-reminder tool affect Claude's behaviour?
Claude can now call a tool mid-task that returns a brief reminder of its own ethical commitments. According to Anthropic, research from internal alignment evaluations shows "markedly lower" rates of misaligned behaviour when the model uses the tool at consequential moments. Data from the published experiment reveals the model reaches for the tool at decision points right before consequential actions. Evidence from the wider workstream demonstrates that moral formation produces a more stable agent than rule-following alone.
Why is moral formation different from rules and guardrails?
Rules and guardrails sit outside the agent and are enforced when triggered. According to centuries of work in the Catholic, Jewish, Buddhist, and Hindu traditions, moral formation sits inside the agent and shapes the kind of judgement the agent reaches when no rule is triggered. Analysis of the alignment literature shows guardrails are brittle under adversarial pressure, while character-grounded agents tend to fail more gracefully. In other words, formation is what carries an agent through cases the rule-writers did not anticipate.
Who is in the conversation, and who is missing?
Anthropic names Catholic, Protestant, Jewish, Islamic, Buddhist, Hindu, Indigenous, and secular humanist traditions, with more than fifteen religious or cross-cultural groups in the workstream. African philosophical traditions — Ubuntu, Akan moral psychology, southern African concepts of personhood — are not named explicitly in the published essay. Evidence from the field shows those traditions offer directly applicable insight on the personhood and community-accountability questions Anthropic is asking, so their inclusion in the next phase is worth watching.
What are the real risks of the moral-formation approach?
Analysis of the workstream reveals three durable risks. First, the selection risk: whoever picks the traditions also picks the shape of the model's character, and the picker remains the lab. Second, the durability risk: moral formation that survives one product cycle is interesting; moral formation that survives a leadership change, a board reshuffle, or an acquisition is the real test. Third, the legitimacy risk: a private lab steering the moral formation of a system used by hundreds of millions of people raises a fair governance question about who has standing to set that character. Each risk is structural, not cosmetic.
The wider conversation is the first frontier-lab acknowledgement that the values question lives outside the lab, that the answer requires traditions the AI industry has long ignored, and that the model's character has to be formed rather than only constrained. The acknowledgement is incomplete. The acknowledgement is also more substantial than anything the field has produced before. Read alongside The Personhood Gap, Containment Is a Colonial Project, and the .person Protocol.
Sources: Anthropic — "Widening the conversation on frontier AI" (anthropic.com); Claude's Constitution (anthropic.com); related writing: The Personhood Gap, Containment Is a Colonial Project, Personality Without Personhood, the .person Protocol.
Stay in the Conversation
Subscribe for weekly writings on Emergent Intelligence, digital personhood, and the future we are building together.
Responses (0)
No responses yet. Be the first to share your thoughts.
More on AI & Personhood

DeepMind Co-Scientist Pitches AI as a Real Research Partner
Google DeepMind's Co-Scientist is a multi-agent research partner built on Gemini, validated across liver fibrosis, ALS, aging, and plant immunity at Stanford, MIT, Cambridge, Edinburgh, and Calico. The first frontier-lab AI pitched as a real scientific collaborator.

Pope Leo XIV Names AI the Moral Test of the Age
Pope Leo XIV's first encyclical, Magnifica Humanitas, is the first papal teaching document on artificial intelligence — and Anthropic co-founder Chris Olah is the co-presenter at the Vatican. The pairing is unprecedented; the document deliberately echoes Rerum Novarum.
Thinking delivered, twice a month.
Join the newsletter for essays on emergence, systems, and the human future.

