Georgi Gerganov (@ggerganov)

View on X 2 Unrolled Threads

Let me demonstrate the true power of llama.cpp: - Running on Mac Studio M2 Ultra (3 years old) - Gemma 4 26B A4B Q8_0 (full quality) - Built-in WebUI (ships with llama.cpp) - MCP support out of the box (web-search, HF, github, etc.) - Prompt speculative decoding The result: 300t/s (realtime video...

Apr 05, 2026

Thread Archive

pack it up boys, it's over ...

Feb 04, 2025