| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Alessandro Favero

@alesfav

AI needs vastly more data than we do. One idea might close the gap: don't predict raw signals (tokens), predict your own abstract latent representation (JEPA, data2vec). With @DanKorchinski @MatthieuWyart, on a toy model, we prove how much that helps: the gap is exponential. 🧵

Apply Image

Drag Post #2

Alessandro Favero

@alesfav

We study recovering the hidden latent tree of a hierarchical grammar. Token-level SSL pays a depth tax: the data it needs grows exponentially with the tree's depth. We prove that iteratively supervising on latents escapes it, recovering the tree with constant-in-depth data!

Apply Image

Drag Post #3

Alessandro Favero

@alesfav

Surprisingly, we found data2vec already does this with a single module. Through its teacher, it implicitly supervises on latents at every level, reaching the same constant-in-depth scaling. 🤯 The hierarchy unfolds during training rather than being stacked into the architecture.

Drag Post #4

Alessandro Favero

@alesfav

This result also suggests that explicit stacking, like H-JEPA, may be redundant. Many open questions! 📄 Our paper: <a target="_blank" href="https://arxiv.org/abs/2605.27734" color="blue">arxiv.org/abs/2605.27734</a>

Drag Post #5

Alessandro Favero

@alesfav

@TMoldwin @DanKorchinski @MatthieuWyart Latent prediction avoids that bottleneck by learning one level, then using that learned level as the target/context for the next. We may write a more accessible blog post version at some point!