Artificial Analysis (@ArtificialAnlys)

View on X 5 Unrolled Threads

Claude Fable 5 cost ~$6.2K to run the Artificial Analysis Intelligence Index benchmarks - the most expensive model we have ever benchmarked 🧵 Key takeaways: ➤ Intelligence Index: 60, ahead of Claude Opus 4.8 (56) and GPT-5.5 (55) ➤ Cost to run the Intelligence Index: $6.2K, 1.7× the next-highest ...

Jun 18, 2026

Thread Archive

We’ve added a new pseudonymous video model to our Text to Video and Image to Video Arenas.‘HappyHorse-1.0’ is currently landing in the #1 spot for Text and Image to Video (No Audio) and the #2 spot for Text and Image to Video (With Audio). Further details coming soon. Example generations below fro...

Apr 10, 2026

Thread Archive

DeepSeek V3.2 is the #2 most intelligent open weights model and also ranks ahead of Grok 4 and Claude Sonnet 4.5 (Thinking) - it takes DeepSeek Sparse Attention out of ‘experimental’ status and couples it with a material boost to intelligence @deepseek_ai V3.2 scores 66 on the Artificial Analysis I...

Dec 03, 2025

Thread Archive

Qwen3 model family overview: full benchmarks for all 8 Qwen3 models in both reasoning and non-reasoning modes Key results: ➤ Qwen3 235B-A22B (Reasoning): The largest Qwen3 model scores 62 on the Artificial Analysis Intelligence Index, becoming the most intelligent open weights model ever. This is v...

May 13, 2025

Thread Archive

Mistral Medium 3 independent evals: Mistral is back amongst the leading non-reasoning models with Medium 3 rivalling Llama 4 Maverick, Gemini 2.0 Flash and Claude 3.7 Sonnet Key takeaways: ➤ Intelligence: We see substantial intelligence gains across all 7 of our evals compared to @MistralAI Large 2...

May 09, 2025