โœจ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135ยฐ

style Card Style

40px
16px

text_fields Typography

16px
Jigar Shah
@JigarShahDC
Everyone's debating 1+ GW data centers like they're the new normal.

How many are there now and how many do we actually need for frontier AI training by 2030? Can inference live at regional hubs? And what if we added compute at every telco tower that can handle 100 kW of compute?

The answers are more nuanced than most think. Let's do the math. ๐Ÿงต
Jigar Shah
@JigarShahDC
First: how many 1+ GW data centers exist today?

Essentially zero fully operational ones. The largest running AI campuses today are in the 300โ€“600 MW range โ€” xAI Colossus in Memphis (~300 MW operational), Abilene TX Stargate Phase 1 (~600 MW). Northern Virginia's entire data center cluster totals ~16 GW but no single campus crosses 1 GW.

The 1+ GW campus is a 2027โ€“2028 phenomenon. We have hysteria for a threshold we haven't crossed yet.
Jigar Shah
@JigarShahDC
So what does a frontier training run actually consume today?

GPT-4 class (2023): ~25K GPUs โ†’ ~17 MW peak
GPT-4o class (2024): ~50K GPUs โ†’ ~70 MW
Frontier today (2026): ~200โ€“500K GPUs โ†’ 140โ€“350 MW
Next frontier (2028): ~500Kโ€“1.5M GPUs โ†’ 350 MW โ€“ 1 GW
Post-frontier (2030): โ†’ 1โ€“5 GW per run

We haven't crossed the 1 GW-per-run threshold yet.
Jigar Shah
@JigarShahDC
One run is not one lab's total need.

Labs run parallel experiments simultaneously โ€” ablations, architecture searches, safety evals, fine-tuning. That multiplies compute requirements 3โ€“5ร— on top of the headline training run number.

And between training runs, those same clusters get repurposed for inference. A 1 GW campus is never idle. It's the minimum viable unit for a frontier lab at full operation.
Jigar Shah
@JigarShahDC
So: how many 1+ GW training campuses do we need by 2030?

Five labs matter at the frontier: OpenAI, Google, Meta, Amazon, Anthropic. Each needs 2โ€“4 geographically distributed campuses for fault tolerance and geographic redundancy.

OpenAI/Stargate: 4โ€“5 sites
Google/Alphabet: 4โ€“5 sites
Meta: 3โ€“4 sites
Amazon/AWS: 3โ€“4 sites
Anthropic: 1โ€“2 sites

Total: ~15โ€“20 campuses industry-wide. Oracle alone has 5 sites at 1.2โ€“2.2 GW. So we have all we will need for training in 2030 already contracted/announced.
Jigar Shah
@JigarShahDC
Now inference. The intuition: it's embarrassingly parallel, so scatter it in smaller facilities. Right?

There's a hard wall: model weight size.

A frontier model today is 1โ€“2 TB of weights. You need the entire thing loaded in GPU memory to serve a single request. At 80 GB per H100, that's 100โ€“200 GPUs minimum just to load the model weights, before you serve a single token.

Frontier inference does not fit at a residential home.
Jigar Shah
@JigarShahDC
Inference is actually four distinct problems at four different scales. Conflating them is where most analysis goes wrong.

Tier 1 ยท 100โ€“500 MW regional hubs (~50โ€“100 sites)
Frontier model serving. Full weights in GPU memory. Hyperscaler cloud regions, CoreWeave clusters.

Tier 2 ยท 5โ€“50 MW metro nodes (~500โ€“1,000 sites)
Distilled 7Bโ€“405B models. Telco MEC aggregation points. Most enterprise AI workloads.

Tier 3 ยท 50โ€“500 kW tower sites (~100K sites)
Upgraded telco towers. Latency-critical applications only.

Tier 4 ยท sub-1 kW on-device (billions of endpoints)
Apple Neural Engine, Qualcomm NPU. Quantized 7B models. No network call needed.
Jigar Shah
@JigarShahDC
The key insight about those four tiers: they serve completely different things.

By request count: Tiers 3 and 4 handle the vast majority โ€” billions of lightweight queries, voice commands, on-device autocomplete.

By compute consumed: Tier 1 dominates โ€” a small number of agentic, frontier-model sessions eating the vast bulk of GPU-hours.

Most requests are cheap. Most compute goes to a few expensive sessions. Both statements are simultaneously true.
Jigar Shah
@JigarShahDC
The agentic workload problem blows up the distributed inference thesis. It's too expensive to serve.

Claude Code grew 70ร— in under one year post Sonnet and Opus 4 launch. OpenAI Codex grew 7ร— in six months post GPT-5 launch.

Agentic sessions run for minutes to hours. They require the full frontier model โ€” not a distilled version. They generate 10โ€“100ร— more tokens per session than a chat message.

You cannot serve this from small distributed nodes. The model is too large, sessions too long, and throughput requirements too high.
Jigar Shah
@JigarShahDC
What if we upgraded every telco tower to 100 kW of AI compute?

Nokia, Ericsson, and the major carriers are genuinely studying this. Here's the hardware math:

100 kW minus 40% overhead = ~85 H100s available
85 ร— 80 GB = 6.8 TB VRAM โ€” a quantized 405B model fits

Throughput on a quantized frontier model: ~2,000โ€“5,000 tokens/second total

At 500โ€“2,000 tok/s per agentic session: 2โ€“10 simultaneous users max

Here's the problem: that tower covers ~10,000 active users. At peak, far more than 10 want AI. Throughput vs user density is a structural 1,000ร— mismatch.
Jigar Shah
@JigarShahDC
Don't dismiss the telco tower idea entirely though. For latency-critical applications, 100 kW towers are genuinely compelling.

Works well: real-time voice AI (<50ms required), AR/VR spatial AI (motion sickness threshold ~20ms), autonomous vehicles (safety requires sub-50ms), industrial IoT anomaly detection

Partial: chat and coding assist with distilled models โ€” frontier needs a hub

Doesn't work: agentic long sessions, frontier general serving โ€” throughput mismatch vs user density is structural, not fixable with better hardware
Jigar Shah
@JigarShahDC
Three other challenges for 100 kW tower upgrades:

Solvable โ€” Thermal: 100 kW generates ~40 kW of waste heat. Current towers handle 5โ€“10 kW. Liquid cooling at street level is hard but Nokia and Ericsson are prototyping it. Solvable.

Solvable โ€” Reliability: Telco uptime is 99.999%. AI inference only needs 99.9%. Telco power infrastructure is actually over-spec'd for this use case.

Hard โ€” Economics: Upgrading 10% of US towers (~40,000 sites) costs $20 in hardware alone. Revenue model is unclear. Building more Tier 2 metro nodes almost certainly wins on unit economics for general inference.
Jigar Shah
@JigarShahDC
What about residential nodes? The math is bleak.

A home has ~200A service = ~48 kW total. Dedicate 15 kW to AI after HVAC, EV charging, appliances. That buys roughly 20 H100s and 1.5 TB of VRAM.

You can run a quantized 70B model. You cannot run a frontier 1T+ model โ€” it won't fit. And you have zero redundancy, no commercial SLA path, no adequate cooling, and residential power quality problems.

Residential inference is structurally impossible above 70B distilled models. These constraints don't go away with better hardware.
Jigar Shah
@JigarShahDC
The binding constraint through 2028 isn't demand or capital. It's physical supply.

Gas turbine prices are up 195% since 2019 with 6-year lead times on large units. Still can't get GPUs, Memory, or CPUs. The existing grid can handle the 50GWs but the rest of the supply chain isn't there yet.

600 GW of projects are pissing everyone off and making it hard for anyone to sign contracts.

You cannot overbuild what you cannot build.
Jigar Shah
@JigarShahDC
The full picture:

๐Ÿ—๏ธ Today: zero operational 1+ GW campuses. Largest are 300โ€“600 MW. Hopefully in 2027โ€“28.

๐Ÿญ 15โ€“20 giant campuses (1โ€“5 GW) needed for frontier training by 2030. Already under construction. So the other 600GW should stop chasing.

๐Ÿข 50โ€“100 regional hubs (100โ€“500 MW) for frontier inference. Also non-negotiable โ€” model weights are simply too large for anything smaller. The are also data centers that have been identified.

๐Ÿ“ก 100 kW telco towers โ€” real value for voice AI, AR/VR, autonomous vehicles. Not a general solution. Throughput vs user density is structurally broken for frontier workloads.

๐Ÿ  Residential nodes โ€” viable only for on-device 7B models. Not part of the frontier inference stack at all.

The distributed inference dream is real. It lives at Tier 4 (on-device) and Tier 3 (latency-critical edge). Frontier training and agentic workloads will remain centralized โ€” not because of ideology, but because physics and model size demand it.
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press โŒ˜ + S to quick-export