Hi,👋 we have updated the app and fixed multiple bugs. We are lacking funds, request to free user not to use Adblock. Ads are non intrusive. 😊

Carousel Studio

Repurpose X Threads into LinkedIn & Instagram Carousels

Thread Truncated (Cap Enforced)

Only the first 20 tweets are unrolled into slides to ensure reliable PDF exporting and high server performance.

Canvas & Ratio

Choose your destination platform format


Layout Template

Choose a content structure for your slides


Preset Themes


Typography & Sizing

Title Font Size36px
Body Font Size18px
Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)
AGENCY
SAVE PRESETS (AGENCY)

Outro Slide CTA

Customize your closing call-to-action slide

#1
#2
#3

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1
Ahmad
@TheAhmadOsman

> <b>You don't pick an inference engine first. You pick a hardware strategy, a workload shape, and a serving model. The engine follows.</b>

Apply Image
Drag Post #2
Ahmad
@TheAhmadOsman

That is the most useful way to think about LLM inference engines.

Drag Post #3
Ahmad
@TheAhmadOsman

<b>Series note:</b> This is Part 3 in my series teaching Self-hosted LLMs / Local AI.

Drag Post #4
Ahmad
@TheAhmadOsman

• Part 1: <b><a target="_blank" href="https://x.com/TheAhmadOsman/status/2040103488714068245" color="blue">GPU Memory Math for LLMs (2026 Edition)</a></b><a target="_blank" href="https://x.com/TheAhmadOsman/status/2040103488714068245" color="blue"></a>.

Drag Post #5
Ahmad
@TheAhmadOsman

• Part 2: <b><a target="_blank" href="https://x.com/TheAhmadOsman/status/2041331757329285589" color="blue">Memory Bandwidth for Local AI Hardware (2026 Edition)</a></b><a target="_blank" href="https://x.com/TheAhmadOsman/status/2041331757329285589" color="blue"></a>.

Drag Post #6
Ahmad
@TheAhmadOsman

Those two pieces explain the hardware capacity and bandwidth math.

Drag Post #7
Ahmad
@TheAhmadOsman

<b><i>This one explains the software layer that turns that hardware into usable inference.</i></b><i></i>

Drag Post #8
Ahmad
@TheAhmadOsman

## Engines

Drag Post #9
Ahmad
@TheAhmadOsman

These tools serve different purposes / occupy different layers

Drag Post #10
Ahmad
@TheAhmadOsman

• Local portability

Drag Post #11
Ahmad
@TheAhmadOsman

• Consumer CUDA

Drag Post #12
Ahmad
@TheAhmadOsman

• Apple unified-memory workflows

Drag Post #13
Ahmad
@TheAhmadOsman

• Quantized inference

Drag Post #14
Ahmad
@TheAhmadOsman

• Production serving

Drag Post #15
Ahmad
@TheAhmadOsman

• Distributed orchestration

Drag Post #16
Ahmad
@TheAhmadOsman

• Vendor-optimized datacenter execution

Drag Post #17
Ahmad
@TheAhmadOsman

<b>A useful mental model:</b>

Drag Post #18
Ahmad
@TheAhmadOsman

Apply Image
Drag Post #19
Ahmad
@TheAhmadOsman

The inference engine is not "the model." It is the traffic cop, memory manager, kernel dispatcher, scheduler, cache accountant, parallelism planner, API surface, and sometimes the deployment framework.

Drag Post #20
Ahmad
@TheAhmadOsman

The best engine matches your <b>memory hierarchy</b>, <b>interconnect</b>, <b>quantization format</b>, <b>latency and throughput targets</b>, <b>model architecture</b>, and <b>operational maturity</b>.