| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

AboveSpec

@above_spec

"You need a 24 GB GPU for serious local LLMs in 2026." Everyone repeats this. It's not true anymore. Just ran a 35B-parameter model on an RTX 4060 Ti 8 GB: • 41 tok/s at 16k context • 24 tok/s at 200k context Recipe + benchmarks below 🧵

Apply Image

Drag Post #2

AboveSpec

@above_spec

How to make it work: MoE offload. Qwen3.6-35B activates only 3 B params per token. Keep attention + shared weights on GPU, push the cold expert FFNs to system RAM. In llama.cpp: -ngl 99 -ncmoe 99. q8_0 KV cache. ~10 KB/token. 200k fits in 2 GB VRAM with FlashAttention on.

Drag Post #3

AboveSpec

@above_spec

Receipts. Qwen3.6-35B-A3B Q4_K_S, q8_0 KV, single-batch, FA on. Same machine, same model, varying context depth: PP barely moves (332 t/s even at 200 k tokens). TG decays ~linearly with depth — attention scan over the full KV is the bottleneck, as expected.

Apply Image

Drag Post #4

AboveSpec

@above_spec

Why RTX 3070 8GB owners shouldn't write the card off: The real bottleneck is host-RAM bandwidth for the MoE experts (~3 B active × Q4 ≈ 1.5 GB/token of streaming reads from DDR5), not GPU compute. The 3070 actually has higher memory bandwidth than the 4060 Ti (448 vs 288 GB/s).

Drag Post #5

AboveSpec

@above_spec

If you have 64 GB RAM and a half-decent CPU, MoE + 8GB is arguably the new home-LLM sweet spot. What setup are you running?

Drag Post #6

AboveSpec

@above_spec

@MikelEcheve made a cool image for this post!

Apply Image