๐ฌ Science & Research
Scaling laws predict an LLM's pretraining loss, but not its capabilities. Abilities like in-context learning emerge abruptly and only past a certain scale. Our new paper traces this to one bottleneck: learning which tokens attention should focus on. ๐งต<a target="_blank" href="https://arxiv.org/abs/26...
Jun 26, 2026