Here is gemma-4-26B-A4B-it on A17 Pro chip w/8GB memory ( MacBook Neo)
~ 7 t/s running on AMX ( GPU is slower on A17)
Gemma's 4 expert is x2.3 larger than Qwen
See Qwen 35B below
VIDEO
Qwen 3.5-35B-A3B, ~ 7.5 tps
with larger cache due to smaller expert
with larger cache due to smaller expert
VIDEO
A18 Pro as per screenshot
Generated by Thread Navigator
Press ⌘ + S to quick-export
