โœจ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135ยฐ

style Card Style

40px
16px

text_fields Typography

16px
Turing Post
@TheTuringPost
Absolute Zero is a new paradigm from @Tsinghua_Uni that encourages models to learn without human-labeled data.

It's a self-play process, where the model is both a proposer and a solver.

- A model creates its own tasks to learn from.
- It solves these tasks on its own, using feedback from an environmental tool.

Based in this, researchers also built the Absolute Zero Reasoner (AZR) system.

This paradigm shows that you don't need thousands of outside data examples or human guidance to get SOTA results.

Details ๐Ÿงต
Thread image
Turing Post
@TheTuringPost
1. Roles and rewards in Absolute Zero:

The model plays 2 roles:

- A proposer: It invents a new reasoning task.
- A solver: It tries to solve that task.

An environment tool checks if the task makes sense and provides the right answer. The model then tries to answer the task. If it does well, it gets rewarded.

There are 2 types of feedback:
- One for coming up with a good, learnable task.
- Another for solving it correctly.
Thread image
Turing Post
@TheTuringPost
2. The Absolute Zero Reasoner (AZR) is a first working system that fully uses the Absolute Zero paradigm.

You only need a single very simple "return" program to kickstart AZR's self-training loop.
Thread image
Turing Post
@TheTuringPost
3. AZR uses code problems as its main learning tool.

- It creates and solves a set of coding tasks based on past tasks it already made and solved, and the type of reasoning it wants to practice (deduction, abduction, or induction).
- Python is used to check if the tasks are valid and then, if the model's answers are correct.
- AZR uses 2 scores for training: for proposing good tasks and for solving them.
Thread image
Turing Post
@TheTuringPost
4. Even though AZR was trained without any human-written data, its has impressive results:

- It beat the best "zero-data" models by +1.8%
- AZR improved its math score by +15.2% vs. +0.65% of other top models

Also:
- Bigger models learn more
- Code helps general reasoning (even in math)
- Emergent planning: AZR starts writing step-by-step explanations as comments.
Thread image
Turing Post
@TheTuringPost
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press โŒ˜ + S to quick-export