| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

andthattoo

@andthatto

Qwen 3.6 is frontier for local. It also thinks forever. I tried a dumb inference-time trick: make its <think> block obey a tiny grammar. Result: - HumanEval+: 22x fewer think tokens, no accuracy loss - LiveCodeBench public slice: +14% pass@1, ~5x fewer total tokens

VIDEO

Apply Image

Drag Post #2

andthattoo

@andthatto

No finetuning. Just GBNF-constrained decoding. The constraint is applied only to the reasoning block, not the final answer/code.

Drag Post #3

andthattoo

@andthatto

On HumanEval+ with Qwen3.6-35B-A3B: Free-form thinking: 92.1% pass@1 3087 mean think tokens Grammar: 92.7% pass@1 138 mean think tokens Same accuracy band. ~22x fewer thinking tokens.

Drag Post #4

andthattoo

@andthatto

Then I tried a recent LiveCodeBench v6 LeetCode slice. Free-form: 50% pass@1 and 11553 mean think tokens Grammar: 64% pass@1 and 267 mean think tokens

Drag Post #5

andthattoo

@andthatto

This is not “reasoning disappeared.” On harder tasks, some reasoning moved into comments / post-think answer text. Yet it reacts to how grammar is constructed. I believe there may be task specific grammars discovered through @DSPyOSS style prompt optimization.

Drag Post #6

andthattoo

@andthatto

My insight is that a lot of verbose CoT is scaffolding, not essential computation. Constrained decoding can force a denser interface to the model’s latent reasoning. But if the task really needs more deliberation, it leaks somewhere else.

Drag Post #7

andthattoo

@andthatto

I think this is a useful middle ground between: verbose CoT at inference training models to reason in latent space Just constrain the text interface. Full writeup + results: <a target="_blank" href="https://andthattoo.dev/blog/structured_cot" color="blue">andthattoo.dev/blog/structure…</a> and repo: <a target="_blank" href="https://github.com/andthattoo/structured-cot" color="blue">github.com/andthattoo/str…</a>