Qwen3.6 35B A3B model. 55+ tokens/sec. $300 GPU.
No, this isn't a server card. It's an RTX 4060 Ti 8GB.
Previously I posted that I 41 t/s on this gpu and that post blew up and went viral. I went back and made it 34% faster.
And now the speed doesn't drop with context depth at all.
New benchmarks...
May 11, 2026