Thread Truncated (Cap Enforced)
Only the first 20 tweets are unrolled into slides to ensure reliable PDF exporting and high server performance.
Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

> I had a swarm running. 80 agents on the same task, the kind where you can check the answer at the end. About a third of them were quietly garbage.


I did what everyone does. Averaged all 80. Throw a pile of agents at it, average, the mess washes out. Error came back at 0.99. Useless.

So I tried something else. I let the agents grade each other against a small set of questions where I already knew the answer, and fire the worst. Cut the bad ones, average who's left.

0.135.

86% of the error, gone. Same agents. I didn't add anything. I removed.

## Why more agents was never the answer

If your agents are wrong in random, independent ways, adding more cancels the wrongness out. That's the whole pitch, and it's true.

But they all came off the same model. So they miss together. Same hallucinated convention, same misread of the spec, all leaning the same way. Averaging a stack of numbers that lean the same way doesn't move the lean.

Agent 300, agent 400, doesn't matter. The agent count on the slide is the most worthless number in the system, and nobody wants to hear it.

## So you cut instead

Stop trying to drown the bad agents. Remove them.

You need a verify gate. A few questions where you know the truth. Tests, anchors, whatever you have. Score every agent, cut the worst, average the survivors. 0.99 to 0.135.

A plain median on the same dirty swarm gives 0.56. A 20% trimmed mean, 0.82. The firing, 0.135.



Median and trim are blind. They cut a fixed amount and hope. Firing isn't blind. Same idea as trimming, except it knows where the bodies are buried.

## But you can't just crank it

Firing is not a slider you push to 100.

I pushed it. Error dropped, bottomed out, then climbed straight back up. 128% above the bottom by the time I'd gutted nearly everyone. Cut too deep and four agents are holding the whole answer, and four agents is loud and shaky.

The bottom sits further out than your gut says. 30% of my agents were bad. The best cut was 70%.

