@SemiAnalysis_: People think AI inference marg...
@SemiAnalysis_
13 views
Mar 27, 2026
Advertisement
2
Then something changed. Zhipu raised prices 30% in February 2026, the first hike in China's AI market. It sold out instantly. ARR went 25x in 10 months. (2/5)
3
The secret is interactivity: tokens per second per user. It's the dial labs slide between margin and user happiness. Customer requirements depend on the workload, and throughput and costs depend on the hardware. At SemiAnalysis, we think Inference Provider Gross Margins should blend to ~60%. The chart below shows how outcomes vary significantly across hardware. (3/5)
4
We know interactivity matters. Moonshot tried aggressive batching to cut costs. Users left. They added a premium tier. DeepSeek lost share serving their own model the same way. (4/5)


