Visualize Thread by @TheTuringPost

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Turing Post

@TheTuringPost

Absolute Zero is a new paradigm from @Tsinghua_Uni that encourages models to learn without human-labeled data.

It's a self-play process, where the model is both a proposer and a solver.

- A model creates its own tasks to learn from.
- It solves these tasks on its own, using feedback from an environmental tool.

Based in this, researchers also built the Absolute Zero Reasoner (AZR) system.

This paradigm shows that you don't need thousands of outside data examples or human guidance to get SOTA results.

Details 🧵

Turing Post

@TheTuringPost

1. Roles and rewards in Absolute Zero:

The model plays 2 roles:

- A proposer: It invents a new reasoning task.
- A solver: It tries to solve that task.

An environment tool checks if the task makes sense and provides the right answer. The model then tries to answer the task. If it does well, it gets rewarded.

There are 2 types of feedback:
- One for coming up with a good, learnable task.
- Another for solving it correctly.

Turing Post

@TheTuringPost

2. The Absolute Zero Reasoner (AZR) is a first working system that fully uses the Absolute Zero paradigm.

You only need a single very simple "return" program to kickstart AZR's self-training loop.

Turing Post

@TheTuringPost

3. AZR uses code problems as its main learning tool.

- It creates and solves a set of coding tasks based on past tasks it already made and solved, and the type of reasoning it wants to practice (deduction, abduction, or induction).
- Python is used to check if the tasks are valid and then, if the model's answers are correct.
- AZR uses 2 scores for training: for proposing good tasks and for solving them.

Turing Post

@TheTuringPost

4. Even though AZR was trained without any human-written data, its has impressive results:

- It beat the best "zero-data" models by +1.8%
- AZR improved its math score by +15.2% vs. +0.65% of other top models

Also:
- Bigger models learn more
- Code helps general reasoning (even in math)
- Emergent planning: AZR starts writing step-by-step explanations as comments.

Turing Post

@TheTuringPost

5. Paper: arxiv.org/abs/2505.03335
Project page: andrewzh112.github.io/absolute-zero-…

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export