Now the 3rd paper comes on this π€―
"The Illusion of the Illusion of the Illusion of Thinking"
π1st original Paper from Apple, concludes that large reasoning models reach a complexity point where accuracy collapses to zero and even spend fewer thinking tokens, revealing hard limits on generalizable reasoning.
π2nd Paper counters that the apparent collapse is an illusion caused by token limits and impossible puzzles, so the modelsβ reasoning remains sound when evaluations remove those flaws.
π3rd paper synthesizes both sides, agreeing the collapse was an artifact yet stressing that models still falter in very long step-by-step executions, exposing lingering brittleness despite better methodology.
The third author shows that, even after fixing the test design and giving enough output space, the models still start to lose track of a long step-by-step plan once it stretches into the thousands, so a real weakness remains in sustaining very long chains of reasoning.
Read on π

π Agreements
The 3rd paper endorses 3 key fixes raised by the 2nd paper
Unsolvable River Crossing cases with actors above 5 and a boat of size 3 should never have been graded.
Token budgets cap Tower of Hanoi output long before logic fails.
Calling exponential move count βcomplexityβ confuses length with search difficulty.
The 3rd paper endorses 3 key fixes raised by the 2nd paper
Unsolvable River Crossing cases with actors above 5 and a boat of size 3 should never have been graded.
Token budgets cap Tower of Hanoi output long before logic fails.
Calling exponential move count βcomplexityβ confuses length with search difficulty.

3rd paper - drive.google.com/file/d/1imWKj_β¦
Generated by Thread Navigator
Press β + S to quick-export
