Visualize Thread by @AnthropicAI

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Anthropic

@AnthropicAI

New Anthropic Fellows Research: a new method for surfacing behavioral differences between AI models.

We apply the “diff” principle from software development to compare open-weight AI models and identify features unique to each.

Read more: anthropic.com/research/diff-…

Anthropic

@AnthropicAI

If a new model shares a feature with a trusted model, that area probably doesn't need scrutiny.

Model diffing isolates the features unique to the new model—where new risks are most likely to be located.

Anthropic

@AnthropicAI

For example, when we compared Alibaba's Qwen to Meta's Llama, we found a "CCP alignment" feature unique to Qwen and an "American exceptionalism" feature unique to Llama.

Anthropic

@AnthropicAI

This technique isn't perfect—it can be oversensitive, sometimes flagging analogous features as distinct. But by focusing only on differences, it allows us to audit AI models more efficiently.

Anthropic

@AnthropicAI

This research is a product of our Anthropic Fellows program, led by @tomjiralerspong and supervised by @TrentonBricken.

See the full paper here: arxiv.org/abs/2602.11729

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export