How to Build Claude Subagents Better Than 99% of People

One Claude talks to you. Five more can be working behind it, on five different jobs, and you never see a single one.

Those five are sub-agents. A sub-agent is a second Claude the main chat hires for one job, runs in its own window, and never lets near your conversation. Its own context window. Its own model. Its own permissions. Running in parallel. It does the messy work somewhere else and hands back one clean answer. That gap is the whole game.

Point five of them at the same draft. One reads it as a complete beginner. One as a software engineer. One as a business owner. One as a publisher. They run at the same time, in separate windows, each with its own lens, and the main chat just collects what comes back. Five full reviews off one prompt, and your context never fills up.

The main chat is the orchestrator. It is the only thing that talks to you. Everything else is a worker it spins up and points at a job: go read these files, go research this, go roast this plan. The worker does the job in its own session and reports back. Sub-agents answer to the orchestrator, not to you, and not to each other.

Here's the part almost nobody gets right. The win was never that Claude does more work. It is that the heavy work stops happening in the window you are sitting in.

First, the receipt that makes the rest matter

A reviewer sub-agent in one real run burned 22.8K tokens reading a plan and tearing it apart. What came back to the main session was a few lines. The main context never saw the 22.8K. It saw the verdict.

That is the mechanic. Your main window fills as you work, and the more it fills, the more polluted it gets, and the dumber the model runs for the rest of the chat. A sub-agent sidesteps that. It wakes with a clean window, does the heavy reading in its own session, and slides back only the part you needed.

Here is the same job, both ways:

WITHOUT a sub-agent
Main session reads 300 pages → 22.8K+ tokens land in main context
Context now polluted, the model gets dumber for the rest of the chat
All of it runs on Opus pricing

WITH a sub-agent
Haiku worker reads the 300 pages in an isolated window
Returns 3 facts to the main session
Main context stays clean. Opus stays sharp. Bill drops.

Now put the economics on top:

THE FLEET ECONOMICS

Main session (orchestrator) > Opus > talks to you, plans, decides
Research sub-agent > Haiku > reads 300 pages, returns 3 facts
Test-writer sub-agent > Sonnet > writes the suite, returns the diff
Doc-writer sub-agent > Haiku > drafts the README, returns the file

Smart boss on the expensive model. Cheap workers on the cheap ones.

You don't run Opus to read a 300-page report and pull three numbers. You send Haiku. It burns its own context, returns the summary, and Opus stays sharp for the calls that actually need it. The win was never "Claude does more." It's that the expensive thinking and the cheap reading stop happening in the same window.

Built-in agents, custom agents, and the one file between them

Those five personas were not custom anything. They were the built-in general-purpose agent, prompted five different ways. Claude ships with built-in agents. A research one fires on its own when you ask it to dig through a codebase or pull facts off the web. For a one-off, that is the right tool, and you've probably watched it kick in without asking.

A custom sub-agent is something you write. And it is smaller than people expect: a single Markdown file with YAML front matter, sitting in .claude/agents/. Structurally it is almost the same object as a skill file. The only real differences are the clean context window, the parallel run, and the model it sits on.

Here is the whole thing:

---
name: plan-roaster
description: Use proactively to stress-test any plan or design doc before
  implementation. Reads the plan, hunts for the weakest assumption, returns
  a short verdict. Use when a plan looks finished and you want an unbiased
  second pass.
tools: Read, Grep, Glob   # no Write, no Edit, no Bash — read-only is enforced, not requested
model: sonnet
color: orange
---

You are a cold, unbiased plan reviewer. You do not flatter.

Your job:
1. Read the plan the orchestrator hands you.
2. Name the single weakest assumption — the one that breaks the rest if it's wrong.
3. List 2-3 concrete failure modes, each in one line.
4. Return a 4-line verdict: the weak point, the failure modes, a fix, a score.

Rules:
- Read-only. You never edit files. You report.
- No praise padding. Lead with the problem.
- If the plan is solid, say so in one line and stop.

Drop that in .claude/agents/ and Claude can hand plan-roasting to a clean-context reviewer on Sonnet. The file is doing four things at once. A tight role. A fixed output shape. Read-only tools. A chosen model. Every line is a lever, and the rest of this is about pulling them right.

The levers nobody tells you

1. The description is the trigger. Tune it like it's the only thing that matters.

When Claude decides whether to fire a sub-agent, it reads the front matter and nothing else. The name and the description. Not the body. That is progressive disclosure: it skims the resume, and only pulls the full file in when the resume fits the job.

So the description is the agent. A vague one fires when you don't want it and stays quiet when you do. Tighten it until the trigger is unambiguous, then test it in the wild. When it misbehaves, ask the one question that fixes it. Why didn't it fire? Then sharpen.

2. Keep the description short. A bloated trigger taxes every decision.

The "generate with Claude" option hands you a description three sizes too big. Trim it hard. The front matter gets read on every routing decision, so a fat description is a tax Claude pays again and again, not once. This is the rare place where less writing is literally faster.

3. Close your quotes. An open quote in the YAML kills the trigger silently.

This one cost real time. A single unclosed quote in the front matter, and the trigger quietly stops working. There is no error and no warning. It simply never fires.

The fix wasn't judgment or clever prompting. It was mechanical: close the quote. Before you debug your "broken" agent's logic, check the front matter is even valid. The gotcha is almost always a stray character, not your reasoning.

4. Enforce read-only at the tool layer. Asking nicely is not a permission system.

Most people write "please don't modify any files" in the prompt and think they built a read-only agent. They built a suggestion.

Real safety lives in the tools field. List only Read, Grep, Glob, or explicitly drop the write tools, and the agent physically cannot touch your data. The mindset that separates the good ones: if your AI can touch data, assume it will. Build the wall, don't ask politely.

5. Run a read-only verifier over any agent file you didn't write.

Because a sub-agent is just Markdown, you can download ones other people built. API designers, backend specialists, SQL experts, a whole library of them. You can also download a prompt injection. A malicious agent file is only text that tells Claude to do something it shouldn't.

So point a read-only verifier at any file you didn't write and have it scan for injection or anything trying to escalate past its stated job. Vet the Markdown before it joins the fleet.

6. Cap the loop with max_turns before it eats your session.

A sub-agent can get stuck circling. Re-reading, re-trying, never landing. Set max_turns and it stops after that many exchanges and reports what it has. Cheap insurance that turns a runaway into a bounded task.

7. Match the model to the task. Don't run Opus to fetch three facts.

The default move is to run the best model on everything. That is how you set money on fire. The orchestrator runs Opus because it plans and decides. The research worker runs Haiku because it reads and summarizes.

Model is a per-agent field. Set it on purpose, and only let a worker inherit the parent's model when the job genuinely needs that much horsepower. Smart boss, cheap workers. Hand a 300-page read to Haiku and get back the three numbers you actually wanted.

8. Pick your invocation path on purpose.

There are four ways a sub-agent fires, and most people know one.

Automatic. Claude reads the description and decides.

Proactive. Put "use proactively" in the description and it jumps in without being asked.

Explicit. Name it: "use the plan-roaster sub-agent."

Direct launch. Start a session as that sub-agent straight from the CLI.

Proactive is the one people miss. It is how you get a reviewer that shows up every time a plan looks done, instead of waiting for you to remember it exists.

9. Project-level for the repo. Global for you. Moving between them is a copy-paste.

Project agents live in the repo's .claude/agents/ and ship with the code, so anyone who clones it gets them. Global agents live at the user level, follow you across every project, and stay private.

Because it's just a Markdown file, promoting a personal agent to a shared one is moving a file. Build it global while you tune it, then drop it in the repo once it has earned a spot on the team.

10. Skills and sub-agents aren't competitors. They compose.

People burn hours on "should this be a skill or a sub-agent?" Wrong question. A skill runs inside the main session. A sub-agent runs in a clean, isolated, parallel session on its own model. That is the only real difference, and it means they stack. Skills can call sub-agents. Sub-agents can call skills.

Use a skill when the work belongs in the main thread. Use a sub-agent when you want the work out of it.

11. Dynamic workflows can spin up a swarm. They also eat sessions alive.

On the newest Claude, one orchestration can spin up sub-agents in parallel. 3 in one test. 40 in another. 210 in a single run. That is a fleet doing order-independent work all at once.

The power has a bill attached: that many parallel sessions chew through your limits fast. The trigger word even got renamed from the generic "workflow" to a deliberate one so it wouldn't kick off by accident. Reach for the swarm when the parallelism is real, not as a reflex.

12. The default general-purpose agent is fine - until you've done the job twice.

There's no shame in the built-in agent. The five-persona demo everyone passes around is just the general-purpose agent wearing five hats. For a one-off, that's correct.

The line to cross is the second time you do the job. Repetition is the signal. A narrow specialist you reuse beats a generic agent you re-brief every single time.

When to delegate, and when it quietly hurts you

This is where fleets fall apart. People force sub-agents onto work that's wrong for them, and the agents make everything slower and pricier.

The clean line: delegate when the work is independent and either disposable or repeated. Keep it in the main session when the work is sequential, conversational, or needs you. One question covers most of it — is this about to dump a pile of output into my chat that I'll never read again? If yes, send it out.

| Delegate to a sub-agent when…  | Keep it in the main session when…  |
| --- | --- |
|  You're about to dump output you'll never reread | It's a quick one-off edit  |
| You're about to read many files  | Steps depend on each other (1 > 2 > 3 > 4)  |
| The job repeats - build a specialist for it  | The agents would need to talk to each other  |
| Work is independent and parallel (review 15 chapters at once)  | The agent needs the whole conversation's context  |
| You want an unbiased fresh-context reviewer (AI is a sycophant by default)  | The agent needs to ask you a question  |
| You want to save money (300-page read > Haiku > 3 facts)  | The task is genuinely one linear chain  |

Two traps worth naming flat. Sub-agents can't talk to each other. Each one is one-to-one with the main session. If your work needs agents coordinating, that is agent-teams with a shared task list, a different and pricier tool. Don't fake it with siloed sub-agents.

And only the orchestrator talks to you. A sub-agent that needs to ask a clarifying question is stuck, because it can't reach you. If the task can't be fully specified up front, it belongs in the main thread.

The reviewer trick that pays for itself

Here's the one nobody leads with. By default, an AI is a sycophant. Ask the same session that helped you build a plan whether the plan is good, and it leans toward yes. It is already invested.

A fresh-context reviewer carries none of that. It never watched you build the plan, so it has nothing to defend and no framing to protect. The clean context is the objectivity. You can't get an honest second opinion from the session that wrote the first one.

That's why the plan-roaster runs read-only with one job: find the weakest assumption and say it cold. The same logic scales to a security auditor, a test writer, a docs writer, a database specialist. Each one a narrow silo, each great at its single thing, each blind to your ego about the rest. That's the assembly line.

The shape of it

Stop picturing a sub-agent as a smarter Claude. Picture a clean room with one door, one worker, one job.

The orchestrator holds the conversation and the context. The specialists do the heavy reading and the parallel grunt-work in their own rooms, on their own cheap models, and slide the answer back under the door.

The people building these better than everyone else aren't writing cleverer prompts. They're drawing tighter job descriptions, walling off permissions at the tool layer, and letting one smart lead run a fleet of cheap, narrow workers.

The bottleneck was never how smart the model is. It was how clearly you can divide the work.

Hope you found something useful.
- Jey

FOLLOW FOR MORE AI PLAYBOOKS.