@andthatto: Qwen 3.6 is frontier for local...
@andthatto
4 views
Apr 29, 2026
Advertisement
1
Qwen 3.6 is frontier for local.
It also thinks forever.
I tried a dumb inference-time trick: make its block obey a tiny grammar.
Result:
- HumanEval+: 22x fewer think tokens, no accuracy loss
- LiveCodeBench public slice: +14% pass@1, ~5x fewer total tokens
It also thinks forever.
I tried a dumb inference-time trick: make its
Result:
- HumanEval+: 22x fewer think tokens, no accuracy loss
- LiveCodeBench public slice: +14% pass@1, ~5x fewer total tokens
2
No finetuning.
Just GBNF-constrained decoding.
The constraint is applied only to the reasoning block, not the final answer/code.
Just GBNF-constrained decoding.
The constraint is applied only to the reasoning block, not the final answer/code.
3
On HumanEval+ with Qwen3.6-35B-A3B:
Free-form thinking:
92.1% pass@1
3087 mean think tokens
Grammar:
92.7% pass@1
138 mean think tokens
Same accuracy band.
~22x fewer thinking tokens.
Free-form thinking:
92.1% pass@1
3087 mean think tokens
Grammar:
92.7% pass@1
138 mean think tokens
Same accuracy band.
~22x fewer thinking tokens.
4
Then I tried a recent LiveCodeBench v6 LeetCode slice.
Free-form: 50% pass@1 and 11553 mean think tokens
Grammar: 64% pass@1 and 267 mean think tokens
Free-form: 50% pass@1 and 11553 mean think tokens
Grammar: 64% pass@1 and 267 mean think tokens
5
This is not “reasoning disappeared.”
On harder tasks, some reasoning moved into comments / post-think answer text.
Yet it reacts to how grammar is constructed.
I believe there may be task specific grammars discovered through @DSPyOSS style prompt optimization.
On harder tasks, some reasoning moved into comments / post-think answer text.
Yet it reacts to how grammar is constructed.
I believe there may be task specific grammars discovered through @DSPyOSS style prompt optimization.
6
My insight is that a lot of verbose CoT is scaffolding, not essential computation.
Constrained decoding can force a denser interface to the model’s latent reasoning.
But if the task really needs more deliberation, it leaks somewhere else.
Constrained decoding can force a denser interface to the model’s latent reasoning.
But if the task really needs more deliberation, it leaks somewhere else.
7
I think this is a useful middle ground between:
verbose CoT at inference
training models to reason in latent space
Just constrain the text interface.
Full writeup + results:
andthattoo.dev/blog/structure…
and repo: github.com/andthattoo/str…
verbose CoT at inference
training models to reason in latent space
Just constrain the text interface.
Full writeup + results:
andthattoo.dev/blog/structure…
and repo: github.com/andthattoo/str…