Hi,๐Ÿ‘‹ we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. ๐Ÿ˜Š

โœจ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135ยฐ

style Card Style

40px
16px

text_fields Typography

16px
Matthieu wyart
@MatthieuWyart
LLMs learn by predicting tokens. World models (JEPA, data2vec) learn by predicting their own abstractions. Which needs more data? For data with hidden hierarchy, we prove the gap is exponential. arxiv.org/pdf/2605.27734
05:21 AM ยท Jun 01, 2026
Thread image
Matthieu wyart
@MatthieuWyart
Why? A network discovers a latent variable from its correlation with a prediction target. Correlations between latents at the same level of abstraction are far stronger than between a latent and raw tokens. Token prediction dilutes the signal that latent prediction amplifies.
05:22 AM ยท Jun 01, 2026
Matthieu wyart
@MatthieuWyart
We make this precise on simple context-free grammars. Token-level SSL need a sample size exponential in the depth of the latent tree. Learning from your own latents is nearly independent of depth. We show that data2vec implicitly does exactly this hierarchical latent prediction.
05:29 AM ยท Jun 01, 2026
Matthieu wyart
@MatthieuWyart
A consequence: if a single latent-prediction module (data2vec) is already implicitly multi-scale, then explicitly stacking them (e.g. H-JEPA) is to some extent redundant. Work led by @DanKorchinski & @alesfav.
05:33 AM ยท Jun 01, 2026
Matthieu wyart
@MatthieuWyart
@DanKorchinski @alesfav see excellent threads by @DanKorchinski and @alesfav .
05:37 AM ยท Jun 01, 2026
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press โŒ˜ + S to quick-export