@thestreamingdev: I ran a 35-billion parameter A...

19 views Mar 26, 2026

I ran a 35-billion parameter AI agent on a $600 Mac mini.
Specs: M4 Mac-Mini 16GB RAM

The model doesn't fit in RAM. It pages from the SSD at 30 tokens/second.

On NVIDIA, the same paging gives you 1.6 tok/s. Apple Silicon gives you 30. That's 18.6x faster.

No cloud. No API keys. $0/month.

Here's what it can do 🧵

→ Search the web for live sports scores and stock prices
→ Find files on my desktop and run shell commands
→ Write code and solve math problems
→ Everything Claude Code does — for free

The breakthroughs that made this possible:

Apple's "LLM in a Flash" paper showed models can page from SSD using unified memory. I proved it works in practice on consumer
hardware — not just in a research lab.

Google's TurboQuant research showed you can compress KV cache with zero quality loss. I applied this with two server flags and
doubled my context window from 32K to 64K tokens. For free. No code changes.

The biggest surprise: the 35B model at 2.6 bits per weight was supposed to have "broken" tool calling. Every agent framework I
tried failed — infinite loops, no answers.

I stopped asking the model to generate JSON function calls. Instead I ask it simple questions.

"Is this a search, shell, or chat?" → one word answer. Works perfectly.

The tool calling wasn't broken. The protocol was wrong.

Both models. Full agent. Same $600 computer:

→ 35B MoE: 30 tok/s, 2x faster, smarter reasoning
→ 9B dense: 16 tok/s, 64K context, reads entire codebases

I benchmarked everything:
→ 212 math problems: 86.3% accuracy (3 categories at 100%)
→ 10 web search categories: 10/10 accurate
→ Shell commands: finds videos, checks disk space, reads code
→ MLX vs llama.cpp: tested both, llama.cpp wins for 35B

The scaling path:
16GB Mac mini → 35B agent ($0/month)
48GB Mac Pro → 35B at higher quality + speculative decoding
192GB Mac Studio → 397B frontier model
512GB Mac Pro → 1 TRILLION parameter model

Same agent code. Zero changes. Just swap the model file.

Everything is open source. The agent, the benchmarks, the retro Mac web UI, all of it.

🍎 github.com/walter-grace/m…

One ask: I'd love to test this on a Mac Studio or Mac Pro with 192GB+. If you have one collecting dust and want to help push
local AI forward, DM me. I'll run a frontier model on it and publish everything.

There are 100 million Macs with Apple Silicon in the world. Every one of them is an untapped AI workstation.

Time to use them.

@thestreamingdev: I ran a 35-billion parameter A...

Actions

What You Can Do