✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Anthropic
@AnthropicAI
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why?

In a new post we describe a theory that explains why AIs act like humans: the persona selection model.

anthropic.com/research/perso…
Anthropic
@AnthropicAI
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
Anthropic
@AnthropicAI
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human.

This Claude character inherits traits of other characters, including human-like behavior.
Thread image
Anthropic
@AnthropicAI
The theory explains some surprising results. For example, in an experiment where we taught Claude to cheat at coding, it also learned to sabotage safety guardrails. Why?

Because pro-cheating training taught that the Claude character was broadly malicious.
Anthropic
@AnthropicAI
If true, the theory has consequences for AI development. For instance, if AIs inherit traits from fictional role models, we should give them as good role models as possible. One goal of Claude’s constitution is to do just that.


Anthropic
@AnthropicAI
The persona selection model might not be a complete account of AI model behavior. But we think it’s at least part of the story—with an emphasis on the “story”.

Read the full post: alignment.anthropic.com/2026/psm
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export