Vatsal Baherwani (@vatsalbaherwani)

View on X 1 Unrolled Threads

Scaling laws predict an LLM's pretraining loss, but not its capabilities. Abilities like in-context learning emerge abruptly and only past a certain scale. Our new paper traces this to one bottleneck: learning which tokens attention should focus on. 🧵<a target="_blank" href="https://arxiv.org/abs/26...

Jun 26, 2026