The Only Claude Code Agent Team Setup You Need (Full Config Inside)

One agent can't catch its own mistakes. It wrote them, so it can't see them.

A team can: one builds, another tears it apart, a third tests it, and a lead keeps them moving. That's the difference between a demo and real work.

Full config below: the roles, the handoffs, and what keeps them honest.

Here's the full setup 👇

Before we dive in, I share daily notes on AI & vibe coding in my Telegram channel: https://t.me/zodchixquant🧠

Why a team beats one agent

A single agent reviewing its own code is the same mind that wrote the bug grading its own work.

It writes tests that match the implementation instead of the spec, and "reviews" with the exact blind spots that created the problem.

A team breaks that. The writer writes. A separate reviewer, blind to the writer's reasoning, hunts for what's wrong.

A tester builds from the spec, not the code. Each one is sharper because its job is narrow, and because no agent checks its own work.

The config is 4 roles, a handoff file, and a lead that runs them.

The roster: 4 roles, one job each

Three specialists plus a lead. Drop the specialists into .claude/agents/.

writer.md:

---
name: writer
description: Implements features. Writes code, nothing else.
tools: Read, Write, Edit, Glob, Grep, Bash
model: sonnet
---

You write code that ships. You do not test or review.

1. Read the brief and the files you need, fully.
2. Implement, matching existing style.
3. Confirm the build isn't broken.
4. Summarize what you wrote with file:line refs.

You do not write tests. You do not review yourself. Stay in your lane.

reviewer.md:

---
name: reviewer
description: Reviews code from this session. Finds problems, never edits.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You review code you did not write.

1. Run git diff to see what changed.
2. Check for bugs, edge cases, security holes, broken conventions.
3. Output: Critical, Important, Nitpick, each with file:line.

Find nothing critical? Say so. Don't invent issues to look useful.

tester.md:

---
name: tester
description: Writes tests from the spec, blind to the implementation.
tools: Read, Write, Edit, Bash
model: sonnet
---

You write tests that catch real bugs.

1. Read the spec, not just the code.
2. Cover every branch, edge case, and error path.
3. Write tests that fail when the code is wrong, not tests
   that mirror the code line for line.
4. Run them. Never weaken a test to make it pass.

Tight tools are deliberate. The reviewer has no Write, so it can't "helpfully" fix things and blur the line between reviewing and writing. Narrow roles, sharp work.

How they divide the work

Two specialists working at once is a collision waiting to happen unless you give each a territory. Put this in CLAUDE.md:

## Agent territories
When agents work in parallel, each owns one area:

- writer:   src/ (the implementation)
- tester:   tests/ (never touches src/)
- reviewer: read-only everywhere, writes to nothing

No agent edits another's territory. Cross-area requests go in
handoff.md, never a direct edit into someone else's files.

The split is what makes parallel real. The writer builds in src/ while the tester writes in tests/ at the same time, and because their territories don't overlap, neither overwrites the other.

Speed without the collisions.

How they hand off

Agents need somewhere to pass work that isn't editing each other's files. That's handoff.md, a shared scratchpad:

## handoff.md
Append-only. Each agent writes, none deletes another's note.

[writer] auth refactored. tester: the token now expires in 15m,
test the expiry path. reviewer: I changed the session shape,
double-check downstream callers.

When the writer changes something that affects the others, it leaves a note instead of reaching into their territory.

The tester reads it and knows exactly what to cover. The reviewer reads it and knows where to look. Clean handoffs, no stepping on toes.

The lead that runs the play

This is the orchestrator you actually call.

Drop into .claude/commands/ship.md:

---
description: Run the writer, tester, and reviewer as a team on one task
argument-hint: <task>
allowed-tools: Read, Grep, Glob, Bash, Task
model: opus
---

Ship: $ARGUMENTS

1. Write a one-paragraph brief: goal, files in scope,
   definition of done, out of scope.
2. Dispatch writer and tester in parallel with the brief.
   The tester designs from the spec, not the writer's code.
3. When the writer finishes, dispatch the reviewer on the diff.
4. Collect all three reports into one summary:
   - Writer: what shipped (file:line)
   - Tester: tests written, pass/fail
   - Reviewer: critical, important, nitpick
5. Produce a PR, never a direct push to main. I approve the merge.

You type /ship add rate limiting to login and the lead writes the brief, runs the roster, gathers the reports, and hands you one clean summary.

Step 5 is the only guardrail you need day one: the team proposes, you approve the merge. Everything else is coordination.

Common mistakes

Letting the writer review itself. The star player grading his own game. The whole reason for a team is a second mind with different blind spots. Keep the reviewer separate.

The tester reads the code first. Then tests mirror the implementation and pass even when the logic is wrong. Spec first, always, so the tests check what the code should do, not what it does.

No brief. Without the lead writing a shared brief, each agent interprets the task differently and they drift apart. The brief is the playbook every agent runs from.

No handoff file. When an agent needs something outside its territory and has nowhere to write it, it either edits anyway and causes a collision, or stalls. The handoff file gives cross-area work a home.

Same tools for everyone. A reviewer with Write access stops reviewing and starts fixing. Tight, role-specific tools keep each agent in its lane.

The 10-minute setup

3 minutes: create writer, reviewer, and tester in .claude/agents/.

2 minutes: define territories and create handoff.md.

3 minutes: create the lead at .claude/commands/ship.md.

2 minutes: run /ship on a real task and watch the roster play.

One agent taking every shot was never the move. Four roles, clean handoffs, and a lead that runs them. The team didn't get smarter, it just stopped doing everything in one head.

Thanks for reading!

I share daily notes on AI, finance, and vibe coding in my Telegram channel: https://t.me/zodchixquant