@lihanc02: An agent that beats Claude Myt...

11 views Apr 10, 2026

An agent that beats Claude Mythos on Terminal Bench and SWE-bench Verified?

🎉We are excited to share Terminator-1, our newest agent that achieved 95+% on SWE-bench Verified and Terminal-Bench with @MogicianTony!

We show that besides model capabilities, well-designed harness could actually boost the accuracy by 3x in coding tasks.

Well if you really wanted you could get 100% accuracy without solving a single task.

The actual finding is that most AI benchmarks can be easily reward-hacked with simple exploits. Read more about the same 7 design flaws that almost every evaluation has ⬇️

View Tweet

It is just a hack

@lihanc02: An agent that beats Claude Myt...

Actions

What You Can Do