Visualize Thread by @omarsar0

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

elvis

@omarsar0

Hierarchical Reasoning Model

This is one of the most interesting ideas on reasoning I've read in the past couple of months.

It uses a recurrent architecture for impressive hierarchical reasoning.

Here are my notes:

elvis

@omarsar0

The paper proposes a novel, brain-inspired architecture that replaces CoT prompting with a recurrent model designed for deep, latent computation.

elvis

@omarsar0

It moves away from token-level reasoning by using two coupled modules: a slow, high-level planner and a fast, low-level executor.

The two recurrent networks operate at different timescales to collaboratively solve tasks

Leads to greater reasoning depth and efficiency with only 27M parameters and no pretraining!

elvis

@omarsar0

Despite its small size and minimal training data (~1k examples), HRM solves complex tasks like ARC, Sudoku-Extreme, and 30×30 maze navigation, where CoT-based LLMs fail.

elvis

@omarsar0

HRM introduces hierarchical convergence, where the low-level module rapidly converges within each cycle, and the high-level module updates only after this local equilibrium is reached.

This enables nested computation and avoids premature convergence typical of standard RNNs.

elvis

@omarsar0

A 1-step gradient approximation sidesteps memory-intensive backpropagation-through-time (BPTT).

This enables efficient training using only local gradient updates, grounded in deep equilibrium models.

elvis

@omarsar0

HRM implements adaptive computation time using a Q-learning-based halting mechanism, dynamically allocating compute based on task complexity.

This allows the model to “think fast or slow” and scale at inference time without retraining.

elvis

@omarsar0

Experiments on ARC-AGI, Sudoku-Extreme, and Maze-Hard show that HRM significantly outperforms larger models using CoT or direct prediction, even solving problems that other models fail entirely (e.g., 74.5% on Maze-Hard vs. 0% for others).

elvis

@omarsar0

Analysis reveals that HRM learns a dimensionality hierarchy similar to the cortex: the high-level module operates in a higher-dimensional space than the low-level one (PR: 89.95 vs. 30.22).

The authors suggest that this is an emergent trait not present in untrained models.

Paper: arxiv.org/abs/2506.21734

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export