✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Wes Roth
@WesRoth
how Minimax M2.7 was made is absolutely INSANE

it "evolved" 100+ times with zero human input

They built a research agent using an early version of that same model

soon it was handling 30 to 50 percent of their RL team's entire workflow.

and then it got WEIRD
Wes Roth
@WesRoth
The internal harness began autonomously collecting feedback on its own performance, building evaluation sets for internal tasks, and iterating on its own architecture, skills, and memory systems.

The agent was rewriting the tools it uses to do its job.
Wes Roth
@WesRoth
Then they ran a controlled experiment with 100 rounds of autonomous optimization.

MiniMax had M2.7 optimize a model's programming performance on an internal scaffold, entirely without human input.
Wes Roth
@WesRoth
Over 100 rounds. Zero humans.

The outcome was a 30 percent performance improvement on their internal evaluation sets.
Wes Roth
@WesRoth
Then they put it in a competition.

OpenAI's MLE-bench where AI models attempt to beat PhD level Machine Learning researchers at their own game.

(btw the competitions were run on a single cheap GPU, a roughly $4,000 data center card.)
Wes Roth
@WesRoth
Results: 9 gold medals, 5 silver, 1 bronze

Average medal rate of 66.6%

That score ranks second in the world on this benchmark (!)

Opus 4.6 at 75.7%
GPT-5.4 at 71.2 %
Gemini 3.1 at 66.6% (tied with M2.7)

...on a single cheap GPU.

Running autonomously for 24 hours.
Wes Roth
@WesRoth
and that's not even the end of it

full breakdown:
youtube.com/watch?v=7_Q8EC…
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export