LLMs learn by predicting tokens. World models (JEPA, data2vec) learn by predicting their own abstractions. Which needs more data? For data with hidden hierarchy, we prove the gap is exponential. <a target="_blank" href="https://arxiv.org/pdf/2605.27734" color="blue">arxiv.org/pdf/2605.27734</a> ...
Jun 22, 2026