Anyone with 8GB or 12GB VRAM setups needs to understand that "-ncmoe" is the key flag to boost performance on llama.cpp
Here are my results for Qwen3.6 35B A3B, with 64k q8_0 context on a 8GB RTX 3070Ti:
⚪️ no flag → 8.7 tok/s
RAM: 13.6GB & VRAM: 7.8GB
🔴 -ncmoe 35 → 27.5 tok/s
RAM: 12.1GB & VRAM:...
May 08, 2026