Visualize Thread by @gemchange_ltd

✨ Visual Editor

Thread Truncated

Only the first 20 tweets are shown to ensure high-quality rendering and prevent image size issues.

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

gemchanger

@gemchange_ltd

> I had a swarm running. 80 agents on the same task, the kind where you can check the answer at the end. About a third of them were quietly garbage.

gemchanger

@gemchange_ltd

I did what everyone does. Averaged all 80. Throw a pile of agents at it, average, the mess washes out. Error came back at 0.99. Useless.

gemchanger

@gemchange_ltd

So I tried something else. I let the agents grade each other against a small set of questions where I already knew the answer, and fire the worst. Cut the bad ones, average who's left.

gemchanger

@gemchange_ltd

0.135.

gemchanger

@gemchange_ltd

86% of the error, gone. Same agents. I didn't add anything. I removed.

gemchanger

@gemchange_ltd

## Why more agents was never the answer

gemchanger

@gemchange_ltd

If your agents are wrong in random, independent ways, adding more cancels the wrongness out. That's the whole pitch, and it's true.

gemchanger

@gemchange_ltd

But they all came off the same model. So they miss together. Same hallucinated convention, same misread of the spec, all leaning the same way. Averaging a stack of numbers that lean the same way doesn't move the lean.

gemchanger

@gemchange_ltd

Agent 300, agent 400, doesn't matter. The agent count on the slide is the most worthless number in the system, and nobody wants to hear it.

gemchanger

@gemchange_ltd

## So you cut instead

gemchanger

@gemchange_ltd

Stop trying to drown the bad agents. Remove them.

gemchanger

@gemchange_ltd

You need a verify gate. A few questions where you know the truth. Tests, anchors, whatever you have. Score every agent, cut the worst, average the survivors. 0.99 to 0.135.

gemchanger

@gemchange_ltd

A plain median on the same dirty swarm gives 0.56. A 20% trimmed mean, 0.82. The firing, 0.135.

gemchanger

@gemchange_ltd

gemchanger

@gemchange_ltd

Median and trim are blind. They cut a fixed amount and hope. Firing isn't blind. Same idea as trimming, except it knows where the bodies are buried.

gemchanger

@gemchange_ltd

## But you can't just crank it

gemchanger

@gemchange_ltd

Firing is not a slider you push to 100.

gemchanger

@gemchange_ltd

I pushed it. Error dropped, bottomed out, then climbed straight back up. 128% above the bottom by the time I'd gutted nearly everyone. Cut too deep and four agents are holding the whole answer, and four agents is loud and shaky.

gemchanger

@gemchange_ltd

The bottom sits further out than your gut says. 30% of my agents were bad. The best cut was 70%.

gemchanger

@gemchange_ltd

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export