✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Paweł Huryn
@PawelHuryn
After an interview with @karpathy, everyone is talking about what AI agents can/can't do.

But an opinion without data is just a hypothesis.

So, I tested 3x185 workflow executions for a market researcher agent.

The results have shocked me🧵
Thread image
Paweł Huryn
@PawelHuryn
I tested three variants:

I. LLM Workflow: No agency, the entire logic carefully orchestrated.

What was expected:
- An LLM workflow was 2x faster (the same model) compared to an AI Agent.
- An LLM workflow consumed 12x less tokens to an AI Agent.

3/185 "errors" are minor formatting results.
Thread image
Paweł Huryn
@PawelHuryn
II. Agentic Workflow: Deterministic logic moved to the orchestration layer.

More time, more tokens.
100% task success.

GPT-5 (a reasoning model) consumed less tokens than GPT-4o due to better compression.

None of this was surprising. But then:
Thread image
Paweł Huryn
@PawelHuryn
III. AI Agent: Full autonomy without steps to take, just an objective

I were staring at the screen.

An AI agent without predefined reasoning steps succeeded 185/185 times (100%).
Thread image
Paweł Huryn
@PawelHuryn
This is different from my previous observations for the same models:

Paweł Huryn
@PawelHuryn
Conclusions & learnings:

1. For simple use cases, we can already achieve 99%+ reliability
2. A verifier agent with a high TPR would push it even further
3. For complex or critical processes, you still need orchestration
4. Orchestration is faster, cheaper, and more reliable
Paweł Huryn
@PawelHuryn
@karpathy might be right.

We might need 10 years to achieve true AI intelligence.

But autonomy and reliability for most processes seem more like ~12 months away.

Agree? Disagree?

Let me know in the comments.

P.S....
Paweł Huryn
@PawelHuryn
A. Free n8n templates I used for testing: productcompass.pm/p/the-ultimate…
Paweł Huryn
@PawelHuryn
B. Enjoy this?

- Follow me @PawelHuryn for deep researched AI & PM
- Share this thread with others

I appreciate it!

Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export