Hi,πŸ‘‹ we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135Β°

style Card Style

40px
16px

text_fields Typography

16px
Artificial Analysis
@ArtificialAnlys
Mistral Medium 3 independent evals: Mistral is back amongst the leading non-reasoning models with Medium 3 rivalling Llama 4 Maverick, Gemini 2.0 Flash and Claude 3.7 Sonnet

Key takeaways:
➀ Intelligence: We see substantial intelligence gains across all 7 of our evals compared to @MistralAI Large 2. Medium 3 has especially made gains in Coding and Mathematical reasoning capabilities whereby it exceeds Llama 4 Maverick in both our Coding (LiveCodeBench, SciCode) and Math Index (AIME2024, MATH-500). The model performs well in our MMLU-Pro and GPQA evaluations but is closer to Llama 4 Scout than Maverick.
➀ Pricing: Alongside the intelligence increase offered vs. Mistral Large 2, Medium 3 offers a substantial price decrease. Mistral Medium 3 is priced at $0.4/$2 per 1M Input/Output tokens, a 80%/67% decrease in price vs. Mistral Large 2 ($2/$6).
➀ Proprietary: Mistral has not released the weights to the model but in their announcement post hinted at releasing β€œlarge” open weights models β€œover the next few weeks” by noting they’re β€œwe’re excited to β€˜open’ up what’s to come”.πŸ•΅οΈβ€β™‚οΈ
➀ Multimodal: The model has vision capabilities, and Mistral claims to be roughly in-line with Llama 4 Maverick. We have not verified this independently - we plan to be publishing vision evals soon.

See below for further analysis, including individual evaluation scores and its Intelligence vs. Price positioning.
Thread image
Artificial Analysis
@ArtificialAnlys
Intelligence vs. Price positioning: A substantial improvement across both dimensions, lower price and higher intelligence, compared to Mistral's Large 2
Thread image
Artificial Analysis
@ArtificialAnlys
Token usage and efficiency: Medium 3 uses substantially more tokens, due to more verbose responses, than Mistral Large 2 to run our Artificial Analysis Intelligence Index
Thread image
Artificial Analysis
@ArtificialAnlys
Further analysis on Artificial Analysis:
artificialanalysis.ai/models?model-f…
Artificial Analysis
@ArtificialAnlys
Individual evaluation results (all run independently):
Thread image
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press ⌘ + S to quick-export