Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Beautiful research from @Apple More thoughts stop helping once tasks cross critical depth. Thinking tokens rise, then crash, revealing compute inefficiency. So Standard LLMs beat LRMs on easy puzzles, unexpectedly. Researchers stress-test them on puzzles whose difficulty can be dialed up step by step. Thinking pull ahead mid-way, but every model collapses once the puzzle grows past a critical depth. Even stranger, near that point the thinker writes fewer thoughts despite plenty of allowed tokens, hinting at a built-in ceiling on current inference-time reasoning. Key findings below. 🧩 Controlled puzzles Four simulators (Tower of Hanoi, Checker Jumping, River Crossing, Blocks World) raise complexity smoothly while rules stay fixed. Exact grading of each move stops data leakage. 📈 Three regimes Low depth: non-thinking LLMs solve faster and spend fewer tokens. Medium depth: thinking variants win by searching longer. High depth: both hit zero accuracy. The boundary shifts with model size but exists for all. 🤖 Token scaling limit As puzzles harden, thinkers initially emit more tokens. Near collapse their token output drops, even though the budget is far from the 64k cap. Reasoning effort fails to scale with problem depth. 🔍 Thought patterns On easy tasks the correct plan appears early, but the model keeps exploring and sometimes changes its mind, wasting compute. At medium depth the right plan surfaces late. After the threshold no correct plan appears at all. ⚠️ Exact step limits Supplying the Tower of Hanoi algorithm in the prompt should turn reasoning into straight execution. Accuracy still collapses. Large Reasoning Models struggle with straightforward symbolic sequences, hinting at fundamental gaps beyond search.


Puzzles expose a hidden ceiling in thinking models.


More disks make the Tower of Hanoi puzzle harder. At 1-3 disks the plain model is both accurate and brief; at 4-7 disks the thinking version gains accuracy by spending many extra tokens; past about 8 disks both crash to 0% and the thinking model even writes fewer thoughts. This shows current chain-of-thought scaling breaks beyond a small depth.


PAPER - "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity" <a target="_blank" href="https://machinelearning.apple.com/research/illusion-of-thinking" color="blue">machinelearning.apple.com/research/illus…</a>