Visualize Thread by @alesfav | Thread Navigator

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Alessandro Favero

@alesfav

AI needs vastly more data than we do. One idea might close the gap: don't predict raw signals (tokens), predict your own abstract latent representation (JEPA, data2vec).

With @DanKorchinski @MatthieuWyart, on a toy model, we prove how much that helps: the gap is exponential.

🧵

08:30 AM · May 29, 2026

Alessandro Favero

@alesfav

We study recovering the hidden latent tree of a hierarchical grammar.

Token-level SSL pays a depth tax: the data it needs grows exponentially with the tree's depth. We prove that iteratively supervising on latents escapes it, recovering the tree with constant-in-depth data!

08:30 AM · May 29, 2026

Alessandro Favero

@alesfav

Surprisingly, we found data2vec already does this with a single module. Through its teacher, it implicitly supervises on latents at every level, reaching the same constant-in-depth scaling. 🤯

The hierarchy unfolds during training rather than being stacked into the architecture.

08:30 AM · May 29, 2026

Alessandro Favero

@alesfav

This result also suggests that explicit stacking, like H-JEPA, may be redundant.

Many open questions!

📄 Our paper: arxiv.org/abs/2605.27734

08:30 AM · May 29, 2026

Alessandro Favero

@alesfav

@TMoldwin @DanKorchinski @MatthieuWyart Latent prediction avoids that bottleneck by learning one level, then using that learned level as the target/context for the next.

We may write a more accessible blog post version at some point!

09:08 PM · Jun 01, 2026

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export