left curve dev (@leftcurvedev_)

View on X 1 Unrolled Threads

Anyone with 8GB or 12GB VRAM setups needs to understand that "-ncmoe" is the key flag to boost performance on llama.cpp Here are my results for Qwen3.6 35B A3B, with 64k q8_0 context on a 8GB RTX 3070Ti: ⚪️ no flag → 8.7 tok/s RAM: 13.6GB & VRAM: 7.8GB 🔴 -ncmoe 35 → 27.5 tok/s RAM: 12.1GB & VRAM:...

May 08, 2026