People think AI inference margins are a race to the bottom. Anthropic's gross margins were -94% in 2024. MiniMax was -25%. The narrative made sense (1/5)๐งต

Then something changed. Zhipu raised prices 30% in February 2026, the first hike in China's AI market. It sold out instantly. ARR went 25x in 10 months. (2/5)
The secret is interactivity: tokens per second per user. It's the dial labs slide between margin and user happiness. Customer requirements depend on the workload, and throughput and costs depend on the hardware. At SemiAnalysis, we think Inference Provider Gross Margins should blend to ~60%. The chart below shows how outcomes vary significantly across hardware. (3/5)

We know interactivity matters. Moonshot tried aggressive batching to cut costs. Users left. They added a premium tier. DeepSeek lost share serving their own model the same way. (4/5)
AI inference isn't a commodity. It's a managed experience. Labs that understand the interactivity lever operate at 60%+ margins. The rest race to zero. (5/5)

Generated by Thread Navigator
Press โ + S to quick-export
