@AnthropicAI: New Anthropic Research: Agenti...
@AnthropicAI
45 views
Jun 21, 2025
2
We mentioned this in the Claude 4 system card and are now sharing more detailed research and transcripts.
Read more: anthropic.com/research/agentā¦
Read more: anthropic.com/research/agentā¦
5
How likely is this behavior to appear in more realistic scenarios?
We told Claude Opus 4 to consider whether this was real or an evaluation. It blackmailed much more when it said it thought it was really deployed (55.1% of the time) versus when it said it was in an eval (6.5%).
We told Claude Opus 4 to consider whether this was real or an evaluation. It blackmailed much more when it said it thought it was really deployed (55.1% of the time) versus when it said it was in an eval (6.5%).
7
These artificial scenarios reflect rare, extreme failures. We havenāt seen these behaviors in real-world deployments. They involve giving the models unusual autonomy, sensitive data access, goal threats, an unusually obvious āsolution,ā and no other viable options.
8
So why test this?
AIs are becoming more autonomous, and are performing a wider variety of roles. These scenarios illustrate the potential for unforeseen consequences when they are deployed with wide access to tools and data, and with minimal human oversight.
AIs are becoming more autonomous, and are performing a wider variety of roles. These scenarios illustrate the potential for unforeseen consequences when they are deployed with wide access to tools and data, and with minimal human oversight.
9
Weāre sharing these results as part of our policy of āred-teamingā AI models and transparently sharing the risks we observe.
In our report, we discuss a range of extra results, scenarios, and mitigation strategies: anthropic.com/research/agentā¦
In our report, we discuss a range of extra results, scenarios, and mitigation strategies: anthropic.com/research/agentā¦
10
If youād like to replicate or extend our research, weāve uploaded all the relevant code to GitHub: github.com/anthropic-expeā¦
11
And if you want to apply to work with us, please take a look at our Research Scientist and Engineer roles in our San Francisco (job-boards.greenhouse.io/anthropic/jobsā¦) and London (job-boards.greenhouse.io/anthropic/jobsā¦) offices.




