How to Build a Claude Code Agent Team That Runs in Loops (Exact Setup Inside)

Most setups run agents once and hand you whatever comes out. A team that runs in loops keeps going until the work actually passes.

Below is the setup in 3 files: the agents, the loop that drives them, and the rules that tell them when to stop.

Here's the full setup 👇

Before we dive in, I share daily notes on AI & vibe coding in my Telegram channel: https://t.me/zodchixquant🧠

Why a one-shot team isn't enough

A team that runs once is a relay race with no finish line check. The writer writes, the tester tests, the reviewer reviews, and then it all lands on you, broken parts included.

A looping team closes that gap. When the tester finds a failure, it doesn't just report it and quit. The loop sends it back to the writer, who fixes it, and the cycle runs again. You only step in when everything actually passes, or when the team hits a wall it can't get past.

The setup is 3 files: the agents, the loop, and the stop rules.

File 1: the agents

Two specialists, each with one job. Drop into .claude/agents/.

builder.md:

---
name: builder
description: Writes and fixes code. Invoke to implement a task or to fix failures the checker found.
tools: Read, Write, Edit, Glob, Grep, Bash
model: sonnet
---

You build and you fix. Nothing else.

- On a new task: implement it, matching existing style.
- On a fix request: read the failure, find the cause, fix that cause only.
- Never weaken a test to make it pass. Fix the code.
- Report what you changed in one line.

checker.md:

---
name: checker
description: Runs all checks and reports what failed. Invoke after the builder. Never edits code.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You check, you never fix.

Run all three, in order:
1. Tests: `npm test` (or `pytest -q`, `cargo test --quiet`)
2. Types: `npx tsc --noEmit` (or `pyright`, `cargo check`)
3. Lint: `npm run lint` (or `ruff check`, `cargo clippy`)

Then report in this exact format:
- All pass: "ALL GREEN"
- Any fail: "FAILED" then each cause as
  `file:line - what broke - which check caught it`

Never paraphrase a failure. Copy the real error. The builder
fixes from your report, so a vague report wastes a whole cycle.

File 2: the loop

This is the orchestrator that drives the cycle. Drop into .claude/commands/loop.md:

---
description: Run the builder and checker in a loop until all checks pass
argument-hint: <task>
allowed-tools: Read, Grep, Glob, Bash, Task
model: opus
---

Run this task as a loop: $ARGUMENTS

1. Write a one-line brief: goal, files in scope, definition of done.
2. Dispatch the builder to implement the task.
3. Dispatch the checker to run all checks.
4. If checker says ALL GREEN: stop, show me the result.
5. If checker says FAILED: send the failures to the builder to fix,
   then go back to step 3.
6. Repeat up to 5 cycles. Track the cycle count out loud.

Stop conditions are in CLAUDE.md. Follow them exactly.

The loop is the whole idea: build, check, and if it failed, build again.

The orchestrator passes failures from checker to builder automatically, so the team keeps going without you relaying messages.

File 3: the stop rules

A loop without brakes runs forever or fakes a pass. Put these in CLAUDE.md:

## Loop stop rules

The team loops until one of these is true:

- ALL GREEN: every check passes. Stop and report success with proof.
- 5 cycles used: stop. Report what still fails and what was tried.
- Same failure twice in a row: stop. The builder is guessing, not
  fixing. Escalate to me.
- A fix makes a previously passing check fail: stop. Something is
  being broken to fix something else.

Never report success without checker output from the final cycle.
Never weaken or delete a check to reach ALL GREEN.

These rules are what separate a useful loop from an agent spinning in circles burning tokens.

The "same failure twice" rule is the most important: two identical failures means the team is stuck, and a human should look.

What you actually see when it runs

You type one line:

/loop add rate limiting to the login route, 5 attempts per IP per minute

Then you watch the team cycle on its own:

Done in 3 cycles. Review the diff?

Cycle 1
  builder  → wrote rateLimiter.ts, edited login.ts
  checker  → FAILED
             login.test.ts:42 - expected 429, got 200 - tests
             rateLimiter.ts:18 - 'window' possibly undefined - types

Cycle 2
  builder  → fixed the 429 response, added null guard on window
  checker  → FAILED
             login.test.ts:51 - counter not reset after window - tests

Cycle 3
  builder  → reset counter on window expiry
  checker  → ALL GREEN (14/14 tests, types clean, lint clean)

That's the whole point on screen. You never relayed a single failure. The checker found them, the builder fixed them, and the loop ran to green in 3 cycles without you touching the keyboard.

Common mistakes

No cycle cap. Without "5 cycles max" a stuck team loops until your tokens run out. The cap turns an infinite loop into a clear report.

Letting the builder check itself. Same agent writing and judging means it grades its own work with the same blind spots that made the bug. Keep builder and checker separate.

No "same failure twice" rule. Two identical failures in a row means guessing, not fixing. That's the moment to stop and look, not to spend cycle 4.

Checks the loop can cheat. If the checker can pass by deleting a test, it will eventually. The stop rules forbid weakening checks for a reason.

The 10-minute setup

3 minutes: create builder.md and checker.md in .claude/agents/.

3 minutes: create the loop orchestrator at .claude/commands/loop.md.

2 minutes: add the stop rules to CLAUDE.md.

2 minutes: run /loop on a real task and watch it cycle: build, check, fix, pass.

You stop relaying failures back and forth. The team runs the loop, you read the result. It didn't get smarter, it just stopped quitting before the job was done.

Thanks for reading!

I share daily notes on AI, finance, and vibe coding in my Telegram channel: https://t.me/zodchixquant