DeepSeek released another model: this time, it is able to do what most reasoning models can't.
Below we explain how they did this ๐งต๐
1/6

Most reasoning models can not do mathematical proofs due to their reliance on heuristics and how they are trained.
DeepSeek released a model specifically designed to do mathematical proofs, and it does that exceedingly well.
2/6
DeepSeek released a model specifically designed to do mathematical proofs, and it does that exceedingly well.
2/6

DeepSeek starts with their V3 model, instructing it to break down problems into smaller chunks from its prompted chain of thought.
Then, a smaller 7B fine-tuned model solves these smaller pieces. This is done via a language used for mathematical proofs, called Lean.
Lean only compiles if 100% of the solution is logical and verifiable.
3/6
Then, a smaller 7B fine-tuned model solves these smaller pieces. This is done via a language used for mathematical proofs, called Lean.
Lean only compiles if 100% of the solution is logical and verifiable.
3/6

Doing this enough times aggregates together a synthetic dataset. This data, in combination with other collected data, is used to fine tune the model over.
Then, Reinforcement Learning is conducted...
4/6
Then, Reinforcement Learning is conducted...
4/6
DeepSeek does RL on the resulting model, using only 256 problems per step. This signals how sample efficient RL can be: a relatively low amount of high quality synthetic data for RL to give you large gains in performance.
This fact is a key driver of the test-time scaling paradigm we see today.
5/6
This fact is a key driver of the test-time scaling paradigm we see today.
5/6
While Prover-V2 might see limited usage in domains beyond math research, it further showcases how effective the "cold start" method is for reasoning and math related models.
This method was used for R1 and the Qwen models.
6/6
This method was used for R1 and the Qwen models.
6/6
Generated by Thread Navigator
Press โ + S to quick-export
