| Thread Navigator

Thread Truncated (Cap Enforced)

Only the first 20 tweets are unrolled into slides to ensure reliable PDF exporting and high server performance.

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Ahmad

@TheAhmadOsman

> You don't pick an inference engine first. You pick a hardware strategy, a workload shape, and a serving model. The engine follows.

Apply Image

Drag Post #2

Ahmad

@TheAhmadOsman

That is the most useful way to think about LLM inference engines.

Drag Post #3

Ahmad

@TheAhmadOsman

Series note: This is Part 3 in my series teaching Self-hosted LLMs / Local AI.

Drag Post #4

Ahmad

@TheAhmadOsman

• Part 1: <a target="_blank" href="https://x.com/TheAhmadOsman/status/2040103488714068245" color="blue">GPU Memory Math for LLMs (2026 Edition)</a><a target="_blank" href="https://x.com/TheAhmadOsman/status/2040103488714068245" color="blue"></a>.

Drag Post #5

Ahmad

@TheAhmadOsman

• Part 2: <a target="_blank" href="https://x.com/TheAhmadOsman/status/2041331757329285589" color="blue">Memory Bandwidth for Local AI Hardware (2026 Edition)</a><a target="_blank" href="https://x.com/TheAhmadOsman/status/2041331757329285589" color="blue"></a>.

Drag Post #6

Ahmad

@TheAhmadOsman

Those two pieces explain the hardware capacity and bandwidth math.

Drag Post #7

Ahmad

@TheAhmadOsman

This one explains the software layer that turns that hardware into usable inference.

Drag Post #8

Ahmad

@TheAhmadOsman

## Engines

Drag Post #9

Ahmad

@TheAhmadOsman

These tools serve different purposes / occupy different layers

Drag Post #10

Ahmad

@TheAhmadOsman

• Local portability

Drag Post #11

Ahmad

@TheAhmadOsman

• Consumer CUDA

Drag Post #12

Ahmad

@TheAhmadOsman

• Apple unified-memory workflows

Drag Post #13

Ahmad

@TheAhmadOsman

• Quantized inference

Drag Post #14

Ahmad

@TheAhmadOsman

• Production serving

Drag Post #15

Ahmad

@TheAhmadOsman

• Distributed orchestration

Drag Post #16

Ahmad

@TheAhmadOsman

• Vendor-optimized datacenter execution

Drag Post #17

Ahmad

@TheAhmadOsman

A useful mental model:

Drag Post #18

Ahmad

@TheAhmadOsman

Apply Image

Drag Post #19

Ahmad

@TheAhmadOsman

The inference engine is not "the model." It is the traffic cop, memory manager, kernel dispatcher, scheduler, cache accountant, parallelism planner, API surface, and sometimes the deployment framework.

Drag Post #20

Ahmad

@TheAhmadOsman

The best engine matches your memory hierarchy, interconnect, quantization format, latency and throughput targets, model architecture, and operational maturity.