Visualize Thread by @ArtificialAnlys

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Artificial Analysis

@ArtificialAnlys

Mistral Medium 3 independent evals: Mistral is back amongst the leading non-reasoning models with Medium 3 rivalling Llama 4 Maverick, Gemini 2.0 Flash and Claude 3.7 Sonnet

Key takeaways:
➤ Intelligence: We see substantial intelligence gains across all 7 of our evals compared to @MistralAI Large 2. Medium 3 has especially made gains in Coding and Mathematical reasoning capabilities whereby it exceeds Llama 4 Maverick in both our Coding (LiveCodeBench, SciCode) and Math Index (AIME2024, MATH-500). The model performs well in our MMLU-Pro and GPQA evaluations but is closer to Llama 4 Scout than Maverick.
➤ Pricing: Alongside the intelligence increase offered vs. Mistral Large 2, Medium 3 offers a substantial price decrease. Mistral Medium 3 is priced at $0.4/$2 per 1M Input/Output tokens, a 80%/67% decrease in price vs. Mistral Large 2 ($2/$6).
➤ Proprietary: Mistral has not released the weights to the model but in their announcement post hinted at releasing “large” open weights models “over the next few weeks” by noting they’re “we’re excited to ‘open’ up what’s to come”.🕵️‍♂️
➤ Multimodal: The model has vision capabilities, and Mistral claims to be roughly in-line with Llama 4 Maverick. We have not verified this independently - we plan to be publishing vision evals soon.

See below for further analysis, including individual evaluation scores and its Intelligence vs. Price positioning.

Artificial Analysis

@ArtificialAnlys

Intelligence vs. Price positioning: A substantial improvement across both dimensions, lower price and higher intelligence, compared to Mistral's Large 2

Artificial Analysis

@ArtificialAnlys

Token usage and efficiency: Medium 3 uses substantially more tokens, due to more verbose responses, than Mistral Large 2 to run our Artificial Analysis Intelligence Index

Artificial Analysis

@ArtificialAnlys

Further analysis on Artificial Analysis:
artificialanalysis.ai/models?model-f…

Artificial Analysis

@ArtificialAnlys

Individual evaluation results (all run independently):

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export