Announcing ARC-AGI-3
The only unsaturated agentic intelligence benchmark in the world
Humans score 100%, AI <1%
This human-AI gap demonstrates we do not yet have AGI
Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
We build benchmarks that reveal the gap between what's easy for humans, hard for AI
ARC-AGI has repeatedly identified inflection points in AI progress, including the emergence of reasoning systems and the rise of capable AI agents.
ARC-AGI-3 is the next step in that journey
ARC-AGI has repeatedly identified inflection points in AI progress, including the emergence of reasoning systems and the rise of capable AI agents.
ARC-AGI-3 is the next step in that journey

We created an in-house game studio and built 135 novel environments from scratch
No instructions, Core Knowledge Priors-only
In order to beat these games, AI must:
• Explore the environment
• Form hypotheses
• Execute a plan
• Learn and adapt
No instructions, Core Knowledge Priors-only
In order to beat these games, AI must:
• Explore the environment
• Form hypotheses
• Execute a plan
• Learn and adapt
ARC-AGI-3 is a useful research tool to analyze model behavior
Key failure modes seen in our early testing:
• Thinking it is playing another game
• Holding on to early hypothesis
• Unable to forecast into the future
Both AI + human runs have sharable replays
Watch Gemini 3.1 do well on some games, poorly on others:
arcprize.org/replay/34a9614…
arcprize.org/replay/d0e0768…
Key failure modes seen in our early testing:
• Thinking it is playing another game
• Holding on to early hypothesis
• Unable to forecast into the future
Both AI + human runs have sharable replays
Watch Gemini 3.1 do well on some games, poorly on others:
arcprize.org/replay/34a9614…
arcprize.org/replay/d0e0768…
ARC-AGI-3 gives us a formal measure to compare human and AI skill acquisition efficiency
Humans don’t brute force - they build mental models, test ideas, and refine quickly
How close AI is to that? (Spoiler: not close)
Humans don’t brute force - they build mental models, test ideas, and refine quickly
How close AI is to that? (Spoiler: not close)

Also live today: ARC Prize 2026 - 3 tracks, $2,000,000 in prizes available!
Get involved:
• Play a Game: arcprize.org/tasks/ls20
• Build Agents: docs.arcprize.org
• Win Prizes: arcprize.org/competitions/2…
Get involved:
• Play a Game: arcprize.org/tasks/ls20
• Build Agents: docs.arcprize.org
• Win Prizes: arcprize.org/competitions/2…

Generated by Thread Navigator
Press ⌘ + S to quick-export
