Visualize Thread by @JigarShahDC

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Jigar Shah

@JigarShahDC

Everyone's debating 1+ GW data centers like they're the new normal.

How many are there now and how many do we actually need for frontier AI training by 2030? Can inference live at regional hubs? And what if we added compute at every telco tower that can handle 100 kW of compute?

The answers are more nuanced than most think. Let's do the math. 🧵

Jigar Shah

@JigarShahDC

First: how many 1+ GW data centers exist today?

Essentially zero fully operational ones. The largest running AI campuses today are in the 300–600 MW range — xAI Colossus in Memphis (~300 MW operational), Abilene TX Stargate Phase 1 (~600 MW). Northern Virginia's entire data center cluster totals ~16 GW but no single campus crosses 1 GW.

The 1+ GW campus is a 2027–2028 phenomenon. We have hysteria for a threshold we haven't crossed yet.

Jigar Shah

@JigarShahDC

So what does a frontier training run actually consume today?

GPT-4 class (2023): ~25K GPUs → ~17 MW peak
GPT-4o class (2024): ~50K GPUs → ~70 MW
Frontier today (2026): ~200–500K GPUs → 140–350 MW
Next frontier (2028): ~500K–1.5M GPUs → 350 MW – 1 GW
Post-frontier (2030): → 1–5 GW per run

We haven't crossed the 1 GW-per-run threshold yet.

Jigar Shah

@JigarShahDC

One run is not one lab's total need.

Labs run parallel experiments simultaneously — ablations, architecture searches, safety evals, fine-tuning. That multiplies compute requirements 3–5× on top of the headline training run number.

And between training runs, those same clusters get repurposed for inference. A 1 GW campus is never idle. It's the minimum viable unit for a frontier lab at full operation.

Jigar Shah

@JigarShahDC

So: how many 1+ GW training campuses do we need by 2030?

Five labs matter at the frontier: OpenAI, Google, Meta, Amazon, Anthropic. Each needs 2–4 geographically distributed campuses for fault tolerance and geographic redundancy.

OpenAI/Stargate: 4–5 sites
Google/Alphabet: 4–5 sites
Meta: 3–4 sites
Amazon/AWS: 3–4 sites
Anthropic: 1–2 sites

Total: ~15–20 campuses industry-wide. Oracle alone has 5 sites at 1.2–2.2 GW. So we have all we will need for training in 2030 already contracted/announced.

Jigar Shah

@JigarShahDC

Now inference. The intuition: it's embarrassingly parallel, so scatter it in smaller facilities. Right?

There's a hard wall: model weight size.

A frontier model today is 1–2 TB of weights. You need the entire thing loaded in GPU memory to serve a single request. At 80 GB per H100, that's 100–200 GPUs minimum just to load the model weights, before you serve a single token.

Frontier inference does not fit at a residential home.

Jigar Shah

@JigarShahDC

Inference is actually four distinct problems at four different scales. Conflating them is where most analysis goes wrong.

Tier 1 · 100–500 MW regional hubs (~50–100 sites)
Frontier model serving. Full weights in GPU memory. Hyperscaler cloud regions, CoreWeave clusters.

Tier 2 · 5–50 MW metro nodes (~500–1,000 sites)
Distilled 7B–405B models. Telco MEC aggregation points. Most enterprise AI workloads.

Tier 3 · 50–500 kW tower sites (~100K sites)
Upgraded telco towers. Latency-critical applications only.

Tier 4 · sub-1 kW on-device (billions of endpoints)
Apple Neural Engine, Qualcomm NPU. Quantized 7B models. No network call needed.

Jigar Shah

@JigarShahDC

The key insight about those four tiers: they serve completely different things.

By request count: Tiers 3 and 4 handle the vast majority — billions of lightweight queries, voice commands, on-device autocomplete.

By compute consumed: Tier 1 dominates — a small number of agentic, frontier-model sessions eating the vast bulk of GPU-hours.

Most requests are cheap. Most compute goes to a few expensive sessions. Both statements are simultaneously true.

Jigar Shah

@JigarShahDC

The agentic workload problem blows up the distributed inference thesis. It's too expensive to serve.

Claude Code grew 70× in under one year post Sonnet and Opus 4 launch. OpenAI Codex grew 7× in six months post GPT-5 launch.

Agentic sessions run for minutes to hours. They require the full frontier model — not a distilled version. They generate 10–100× more tokens per session than a chat message.

You cannot serve this from small distributed nodes. The model is too large, sessions too long, and throughput requirements too high.

Jigar Shah

@JigarShahDC

What if we upgraded every telco tower to 100 kW of AI compute?

Nokia, Ericsson, and the major carriers are genuinely studying this. Here's the hardware math:

100 kW minus 40% overhead = ~85 H100s available
85 × 80 GB = 6.8 TB VRAM — a quantized 405B model fits

Throughput on a quantized frontier model: ~2,000–5,000 tokens/second total

At 500–2,000 tok/s per agentic session: 2–10 simultaneous users max

Here's the problem: that tower covers ~10,000 active users. At peak, far more than 10 want AI. Throughput vs user density is a structural 1,000× mismatch.

Jigar Shah

@JigarShahDC

Don't dismiss the telco tower idea entirely though. For latency-critical applications, 100 kW towers are genuinely compelling.

Works well: real-time voice AI (<50ms required), AR/VR spatial AI (motion sickness threshold ~20ms), autonomous vehicles (safety requires sub-50ms), industrial IoT anomaly detection

Partial: chat and coding assist with distilled models — frontier needs a hub

Doesn't work: agentic long sessions, frontier general serving — throughput mismatch vs user density is structural, not fixable with better hardware

Jigar Shah

@JigarShahDC

Three other challenges for 100 kW tower upgrades:

Solvable — Thermal: 100 kW generates ~40 kW of waste heat. Current towers handle 5–10 kW. Liquid cooling at street level is hard but Nokia and Ericsson are prototyping it. Solvable.

Solvable — Reliability: Telco uptime is 99.999%. AI inference only needs 99.9%. Telco power infrastructure is actually over-spec'd for this use case.

Hard — Economics: Upgrading 10% of US towers (~40,000 sites) costs $20 in hardware alone. Revenue model is unclear. Building more Tier 2 metro nodes almost certainly wins on unit economics for general inference.

Jigar Shah

@JigarShahDC

What about residential nodes? The math is bleak.

A home has ~200A service = ~48 kW total. Dedicate 15 kW to AI after HVAC, EV charging, appliances. That buys roughly 20 H100s and 1.5 TB of VRAM.

You can run a quantized 70B model. You cannot run a frontier 1T+ model — it won't fit. And you have zero redundancy, no commercial SLA path, no adequate cooling, and residential power quality problems.

Residential inference is structurally impossible above 70B distilled models. These constraints don't go away with better hardware.

Jigar Shah

@JigarShahDC

The binding constraint through 2028 isn't demand or capital. It's physical supply.

Gas turbine prices are up 195% since 2019 with 6-year lead times on large units. Still can't get GPUs, Memory, or CPUs. The existing grid can handle the 50GWs but the rest of the supply chain isn't there yet.

600 GW of projects are pissing everyone off and making it hard for anyone to sign contracts.

You cannot overbuild what you cannot build.

Jigar Shah

@JigarShahDC

The full picture:

🏗️ Today: zero operational 1+ GW campuses. Largest are 300–600 MW. Hopefully in 2027–28.

🏭 15–20 giant campuses (1–5 GW) needed for frontier training by 2030. Already under construction. So the other 600GW should stop chasing.

🏢 50–100 regional hubs (100–500 MW) for frontier inference. Also non-negotiable — model weights are simply too large for anything smaller. The are also data centers that have been identified.

📡 100 kW telco towers — real value for voice AI, AR/VR, autonomous vehicles. Not a general solution. Throughput vs user density is structurally broken for frontier workloads.

🏠 Residential nodes — viable only for on-device 7B models. Not part of the frontier inference stack at all.

The distributed inference dream is real. It lives at Tier 4 (on-device) and Tier 3 (latency-critical edge). Frontier training and agentic workloads will remain centralized — not because of ideology, but because physics and model size demand it.

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export