| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Rohan Paul

@rohanpaul_ai

Beautiful research from @Apple More thoughts stop helping once tasks cross critical depth. Thinking tokens rise, then crash, revealing compute inefficiency. So Standard LLMs beat LRMs on easy puzzles, unexpectedly. Researchers stress-test them on puzzles whose difficulty can be dialed up step by step. Thinking pull ahead mid-way, but every model collapses once the puzzle grows past a critical depth. Even stranger, near that point the thinker writes fewer thoughts despite plenty of allowed tokens, hinting at a built-in ceiling on current inference-time reasoning. Key findings below. 🧩 Controlled puzzles Four simulators (Tower of Hanoi, Checker Jumping, River Crossing, Blocks World) raise complexity smoothly while rules stay fixed. Exact grading of each move stops data leakage. 📈 Three regimes Low depth: non-thinking LLMs solve faster and spend fewer tokens. Medium depth: thinking variants win by searching longer. High depth: both hit zero accuracy. The boundary shifts with model size but exists for all. 🤖 Token scaling limit As puzzles harden, thinkers initially emit more tokens. Near collapse their token output drops, even though the budget is far from the 64k cap. Reasoning effort fails to scale with problem depth. 🔍 Thought patterns On easy tasks the correct plan appears early, but the model keeps exploring and sometimes changes its mind, wasting compute. At medium depth the right plan surfaces late. After the threshold no correct plan appears at all. ⚠️ Exact step limits Supplying the Tower of Hanoi algorithm in the prompt should turn reasoning into straight execution. Accuracy still collapses. Large Reasoning Models struggle with straightforward symbolic sequences, hinting at fundamental gaps beyond search.

Apply Image

Drag Post #2

Rohan Paul

@rohanpaul_ai

Puzzles expose a hidden ceiling in thinking models.

Apply Image

Drag Post #3

Rohan Paul

@rohanpaul_ai

More disks make the Tower of Hanoi puzzle harder. At 1-3 disks the plain model is both accurate and brief; at 4-7 disks the thinking version gains accuracy by spending many extra tokens; past about 8 disks both crash to 0% and the thinking model even writes fewer thoughts. This shows current chain-of-thought scaling breaks beyond a small depth.

Apply Image

Drag Post #4

Rohan Paul

@rohanpaul_ai

PAPER - "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity" <a target="_blank" href="https://machinelearning.apple.com/research/illusion-of-thinking" color="blue">machinelearning.apple.com/research/illus…</a>