@ggerganov: Let me demonstrate the true po...

8 views Apr 05, 2026

Let me demonstrate the true power of llama.cpp:

- Running on Mac Studio M2 Ultra (3 years old)
- Gemma 4 26B A4B Q8_0 (full quality)
- Built-in WebUI (ships with llama.cpp)
- MCP support out of the box (web-search, HF, github, etc.)
- Prompt speculative decoding

The result: 300t/s

(realtime video)

Of course, this is a trivial example of prompt-based speculative decoding, because the model recites sections from what is already in the prompt (so don't get too excited 😉).

Still it's a nice and quick showcase of some of llama.cpp capabilities

llama.cpp with it's integrated WebUI is effectively the most lightweight and self-contained agent that you can run locally.

Here are a few more examples of using @huggingface MCP to search for models

@ggerganov: Let me demonstrate the true po...

Actions

What You Can Do