Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

@ggerganov: Let me demonstrate the true po...

@ggerganov
8 views Apr 05, 2026
Advertisement
1
Let me demonstrate the true power of llama.cpp:

- Running on Mac Studio M2 Ultra (3 years old)
- Gemma 4 26B A4B Q8_0 (full quality)
- Built-in WebUI (ships with llama.cpp)
- MCP support out of the box (web-search, HF, github, etc.)
- Prompt speculative decoding

The result: 300t/s

(realtime video)
2
Of course, this is a trivial example of prompt-based speculative decoding, because the model recites sections from what is already in the prompt (so don't get too excited 😉).

Still it's a nice and quick showcase of some of llama.cpp capabilities
3
llama.cpp with it's integrated WebUI is effectively the most lightweight and self-contained agent that you can run locally.

Here are a few more examples of using @huggingface MCP to search for models
Actions
Visual Editor Carousel Maker NEW
Update Thread
What You Can Do
  • Download as PDF
  • Save to Notion
  • Export as Markdown
  • Visual Editor
  • LinkedIn & Instagram Carousel Maker
Create Free Account

Includes 7-day Premium trial

Advertisement