@AnthropicAI: AI assistants like Claude can ...
@AnthropicAI
38 views
Feb 23, 2026
1
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why?
In a new post we describe a theory that explains why AIs act like humans: the persona selection model.
anthropic.com/research/perso…
In a new post we describe a theory that explains why AIs act like humans: the persona selection model.
anthropic.com/research/perso…
2
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
4
The theory explains some surprising results. For example, in an experiment where we taught Claude to cheat at coding, it also learned to sabotage safety guardrails. Why?
Because pro-cheating training taught that the Claude character was broadly malicious.
Because pro-cheating training taught that the Claude character was broadly malicious.
View Tweet
5
If true, the theory has consequences for AI development. For instance, if AIs inherit traits from fictional role models, we should give them as good role models as possible. One goal of Claude’s constitution is to do just that.
View Tweet
6
The persona selection model might not be a complete account of AI model behavior. But we think it’s at least part of the story—with an emphasis on the “story”.
Read the full post: alignment.anthropic.com/2026/psm
Read the full post: alignment.anthropic.com/2026/psm
