> One Claude talks to you. Five more can be working behind it, on five different jobs, and you never see a single one.

Those five are sub-agents. A sub-agent is a second Claude the main chat hires for one job, runs in its own window, and never lets near your conversation. Its own context window. Its own model. Its own permissions. Running in parallel. It does the messy work somewhere else and hands back one clean answer. That gap is the whole game.
Point five of them at the same draft. One reads it as a complete beginner. One as a software engineer. One as a business owner. One as a publisher. They run at the same time, in separate windows, each with its own lens, and the main chat just collects what comes back. Five full reviews off one prompt, and your context never fills up.
The main chat is the orchestrator. It is the only thing that talks to you. Everything else is a worker it spins up and points at a job: go read these files, go research this, go roast this plan. The worker does the job in its own session and reports back. Sub-agents answer to the orchestrator, not to you, and not to each other.
Here's the part almost nobody gets right. The win was never that Claude does more work. It is that the heavy work stops happening in the window you are sitting in.
---
# First, the receipt that makes the rest matter
A reviewer sub-agent in one real run burned 22.8K tokens reading a plan and tearing it apart. What came back to the main session was a few lines. The main context never saw the 22.8K. It saw the verdict.
That is the mechanic. Your main window fills as you work, and the more it fills, the more polluted it gets, and the dumber the model runs for the rest of the chat. A sub-agent sidesteps that. It wakes with a clean window, does the heavy reading in its own session, and slides back only the part you needed.
Here is the same job, both ways:
WITHOUT a sub-agent
Main session reads 300 pages → 22.8K+ tokens land in main context
Context now polluted, the model gets dumber for the rest of the chat
All of it runs on Opus pricing
WITH a sub-agent
Haiku worker reads the 300 pages in an isolated window
Returns 3 facts to the main session
Main context stays clean. Opus stays sharp. Bill drops.
Main session reads 300 pages → 22.8K+ tokens land in main context
Context now polluted, the model gets dumber for the rest of the chat
All of it runs on Opus pricing
WITH a sub-agent
Haiku worker reads the 300 pages in an isolated window
Returns 3 facts to the main session
Main context stays clean. Opus stays sharp. Bill drops.
Now put the economics on top:
THE FLEET ECONOMICS
Main session (orchestrator) > Opus > talks to you, plans, decides
Research sub-agent > Haiku > reads 300 pages, returns 3 facts
Test-writer sub-agent > Sonnet > writes the suite, returns the diff
Doc-writer sub-agent > Haiku > drafts the README, returns the file
Smart boss on the expensive model. Cheap workers on the cheap ones.
Main session (orchestrator) > Opus > talks to you, plans, decides
Research sub-agent > Haiku > reads 300 pages, returns 3 facts
Test-writer sub-agent > Sonnet > writes the suite, returns the diff
Doc-writer sub-agent > Haiku > drafts the README, returns the file
Smart boss on the expensive model. Cheap workers on the cheap ones.
You don't run Opus to read a 300-page report and pull three numbers. You send Haiku. It burns its own context, returns the summary, and Opus stays sharp for the calls that actually need it. The win was never "Claude does more." It's that the expensive thinking and the cheap reading stop happening in the same window.

---
# Built-in agents, custom agents, and the one file between them
Those five personas were not custom anything. They were the built-in general-purpose agent, prompted five different ways. Claude ships with built-in agents. A research one fires on its own when you ask it to dig through a codebase or pull facts off the web. For a one-off, that is the right tool, and you've probably watched it kick in without asking.
A custom sub-agent is something you write. And it is smaller than people expect: a single Markdown file with YAML front matter, sitting in .claude/agents/. Structurally it is almost the same object as a skill file. The only real differences are the clean context window, the parallel run, and the model it sits on.
Here is the whole thing:
Generated by Thread Navigator
Press ⌘ + S to quick-export
