Mistral Medium 3 independent evals: Mistral is back amongst the leading non-reasoning models with Medium 3 rivalling Llama 4 Maverick, Gemini 2.0 Flash and Claude 3.7 Sonnet
Key takeaways:
β€ Intelligence: We see substantial intelligence gains across all 7 of our evals compared to @MistralAI Large 2. Medium 3 has especially made gains in Coding and Mathematical reasoning capabilities whereby it exceeds Llama 4 Maverick in both our Coding (LiveCodeBench, SciCode) and Math Index (AIME2024, MATH-500). The model performs well in our MMLU-Pro and GPQA evaluations but is closer to Llama 4 Scout than Maverick.
β€ Pricing: Alongside the intelligence increase offered vs. Mistral Large 2, Medium 3 offers a substantial price decrease. Mistral Medium 3 is priced at $0.4/$2 per 1M Input/Output tokens, a 80%/67% decrease in price vs. Mistral Large 2 ($2/$6).
β€ Proprietary: Mistral has not released the weights to the model but in their announcement post hinted at releasing βlargeβ open weights models βover the next few weeksβ by noting theyβre βweβre excited to βopenβ up whatβs to comeβ.π΅οΈββοΈ
β€ Multimodal: The model has vision capabilities, and Mistral claims to be roughly in-line with Llama 4 Maverick. We have not verified this independently - we plan to be publishing vision evals soon.
See below for further analysis, including individual evaluation scores and its Intelligence vs. Price positioning.

Intelligence vs. Price positioning: A substantial improvement across both dimensions, lower price and higher intelligence, compared to Mistral's Large 2

Token usage and efficiency: Medium 3 uses substantially more tokens, due to more verbose responses, than Mistral Large 2 to run our Artificial Analysis Intelligence Index

Further analysis on Artificial Analysis:
artificialanalysis.ai/models?model-fβ¦
artificialanalysis.ai/models?model-fβ¦
Individual evaluation results (all run independently):

Generated by Thread Navigator
Press β + S to quick-export
