@alesfav: AI needs vastly more data than...
@alesfav
9 views
Jun 22, 2026
Advertisement
3
Surprisingly, we found data2vec already does this with a single module. Through its teacher, it implicitly supervises on latents at every level, reaching the same constant-in-depth scaling. ๐คฏ
The hierarchy unfolds during training rather than being stacked into the architecture.
The hierarchy unfolds during training rather than being stacked into the architecture.
4
This result also suggests that explicit stacking, like H-JEPA, may be redundant.
Many open questions!
๐ Our paper: arxiv.org/abs/2605.27734
Many open questions!
๐ Our paper: arxiv.org/abs/2605.27734
5
@TMoldwin @DanKorchinski @MatthieuWyart Latent prediction avoids that bottleneck by learning one level, then using that learned level as the target/context for the next.
We may write a more accessible blog post version at some point!
We may write a more accessible blog post version at some point!

