Matthieu wyart (@MatthieuWyart)

View on X 1 Unrolled Threads

LLMs learn by predicting tokens. World models (JEPA, data2vec) learn by predicting their own abstractions. Which needs more data? For data with hidden hierarchy, we prove the gap is exponential. <a target="_blank" href="https://arxiv.org/pdf/2605.27734" color="blue">arxiv.org/pdf/2605.27734</a> ...

Jun 22, 2026