@godofprompt: 🚨 This is wild.A new paper f...
@godofprompt
44 views
Oct 25, 2025
1
🚨 This is wild.
A new paper from the Ling team just dropped "Every Attention Matters" and it quietly rewrites how long-context reasoning works in LLMs.
Their new Ring-linear architecture mixes Softmax and Linear Attention, cutting inference cost by 10x while keeping SOTA accuracy up to 128K tokens.
Even crazier:
• Training efficiency +50%
• Inference speed +90%
• Stable RL optimization over ultra-long sequences
Basically, they solved long-context scaling without trillion-parameter overkill.
The future isn’t bigger models. It’s smarter attention.
A new paper from the Ling team just dropped "Every Attention Matters" and it quietly rewrites how long-context reasoning works in LLMs.
Their new Ring-linear architecture mixes Softmax and Linear Attention, cutting inference cost by 10x while keeping SOTA accuracy up to 128K tokens.
Even crazier:
• Training efficiency +50%
• Inference speed +90%
• Stable RL optimization over ultra-long sequences
Basically, they solved long-context scaling without trillion-parameter overkill.
The future isn’t bigger models. It’s smarter attention.
10








