AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why?
In a new post we describe a theory that explains why AIs act like humans: the persona selection model.
anthropic.com/research/perso…
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human.
This Claude character inherits traits of other characters, including human-like behavior.
This Claude character inherits traits of other characters, including human-like behavior.

The theory explains some surprising results. For example, in an experiment where we taught Claude to cheat at coding, it also learned to sabotage safety guardrails. Why?
Because pro-cheating training taught that the Claude character was broadly malicious.
Because pro-cheating training taught that the Claude character was broadly malicious.
View Tweet
If true, the theory has consequences for AI development. For instance, if AIs inherit traits from fictional role models, we should give them as good role models as possible. One goal of Claude’s constitution is to do just that.
View Tweet
The persona selection model might not be a complete account of AI model behavior. But we think it’s at least part of the story—with an emphasis on the “story”.
Read the full post: alignment.anthropic.com/2026/psm
Read the full post: alignment.anthropic.com/2026/psm
Generated by Thread Navigator
Press ⌘ + S to quick-export
