how Minimax M2.7 was made is absolutely INSANE
it "evolved" 100+ times with zero human input
They built a research agent using an early version of that same model
soon it was handling 30 to 50 percent of their RL team's entire workflow.
and then it got WEIRD
The internal harness began autonomously collecting feedback on its own performance, building evaluation sets for internal tasks, and iterating on its own architecture, skills, and memory systems.
The agent was rewriting the tools it uses to do its job.
The agent was rewriting the tools it uses to do its job.
Then they ran a controlled experiment with 100 rounds of autonomous optimization.
MiniMax had M2.7 optimize a model's programming performance on an internal scaffold, entirely without human input.
MiniMax had M2.7 optimize a model's programming performance on an internal scaffold, entirely without human input.
Over 100 rounds. Zero humans.
The outcome was a 30 percent performance improvement on their internal evaluation sets.
The outcome was a 30 percent performance improvement on their internal evaluation sets.
Then they put it in a competition.
OpenAI's MLE-bench where AI models attempt to beat PhD level Machine Learning researchers at their own game.
(btw the competitions were run on a single cheap GPU, a roughly $4,000 data center card.)
OpenAI's MLE-bench where AI models attempt to beat PhD level Machine Learning researchers at their own game.
(btw the competitions were run on a single cheap GPU, a roughly $4,000 data center card.)
Results: 9 gold medals, 5 silver, 1 bronze
Average medal rate of 66.6%
That score ranks second in the world on this benchmark (!)
Opus 4.6 at 75.7%
GPT-5.4 at 71.2 %
Gemini 3.1 at 66.6% (tied with M2.7)
...on a single cheap GPU.
Running autonomously for 24 hours.
Average medal rate of 66.6%
That score ranks second in the world on this benchmark (!)
Opus 4.6 at 75.7%
GPT-5.4 at 71.2 %
Gemini 3.1 at 66.6% (tied with M2.7)
...on a single cheap GPU.
Running autonomously for 24 hours.
Generated by Thread Navigator
Press ⌘ + S to quick-export