@TheTuringPost: Absolute Zero is a new paradig...

1

Absolute Zero is a new paradigm from @Tsinghua_Uni that encourages models to learn without human-labeled data.

It's a self-play process, where the model is both a proposer and a solver.

- A model creates its own tasks to learn from.
- It solves these tasks on its own, using feedback from an environmental tool.

Based in this, researchers also built the Absolute Zero Reasoner (AZR) system.

This paradigm shows that you don't need thousands of outside data examples or human guidance to get SOTA results.

Details 🧵

2

1. Roles and rewards in Absolute Zero:

The model plays 2 roles:

- A proposer: It invents a new reasoning task.
- A solver: It tries to solve that task.

An environment tool checks if the task makes sense and provides the right answer. The model then tries to answer the task. If it does well, it gets rewarded.

There are 2 types of feedback:
- One for coming up with a good, learnable task.
- Another for solving it correctly.

3

2. The Absolute Zero Reasoner (AZR) is a first working system that fully uses the Absolute Zero paradigm.

You only need a single very simple "return" program to kickstart AZR's self-training loop.

4

3. AZR uses code problems as its main learning tool.

- It creates and solves a set of coding tasks based on past tasks it already made and solved, and the type of reasoning it wants to practice (deduction, abduction, or induction).
- Python is used to check if the tasks are valid and then, if the model's answers are correct.
- AZR uses 2 scores for training: for proposing good tasks and for solving them.

5

4. Even though AZR was trained without any human-written data, its has impressive results:

- It beat the best "zero-data" models by +1.8%
- AZR improved its math score by +15.2% vs. +0.65% of other top models

Also:
- Bigger models learn more
- Code helps general reasoning (even in math)
- Emergent planning: AZR starts writing step-by-step explanations as comments.

6

5. Paper: arxiv.org/abs/2505.03335
Project page: andrewzh112.github.io/absolute-zero-…

@TheTuringPost: Absolute Zero is a new paradig...

Actions

What You Can Do