Hi,๐Ÿ‘‹ we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. ๐Ÿ˜Š

โœจ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135ยฐ

style Card Style

40px
16px

text_fields Typography

16px
Alessandro Favero
@alesfav
AI needs vastly more data than we do. One idea might close the gap: don't predict raw signals (tokens), predict your own abstract latent representation (JEPA, data2vec).

With @DanKorchinski @MatthieuWyart, on a toy model, we prove how much that helps: the gap is exponential.

๐Ÿงต
08:30 AM ยท May 29, 2026
Thread image
Alessandro Favero
@alesfav
We study recovering the hidden latent tree of a hierarchical grammar.

Token-level SSL pays a depth tax: the data it needs grows exponentially with the tree's depth. We prove that iteratively supervising on latents escapes it, recovering the tree with constant-in-depth data!
08:30 AM ยท May 29, 2026
Thread image
Alessandro Favero
@alesfav
Surprisingly, we found data2vec already does this with a single module. Through its teacher, it implicitly supervises on latents at every level, reaching the same constant-in-depth scaling. ๐Ÿคฏ

The hierarchy unfolds during training rather than being stacked into the architecture.
08:30 AM ยท May 29, 2026
Alessandro Favero
@alesfav
This result also suggests that explicit stacking, like H-JEPA, may be redundant.

Many open questions!

๐Ÿ“„ Our paper: arxiv.org/abs/2605.27734
08:30 AM ยท May 29, 2026
Alessandro Favero
@alesfav
@TMoldwin @DanKorchinski @MatthieuWyart Latent prediction avoids that bottleneck by learning one level, then using that learned level as the target/context for the next.

We may write a more accessible blog post version at some point!
09:08 PM ยท Jun 01, 2026
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press โŒ˜ + S to quick-export