After an interview with @karpathy, everyone is talking about what AI agents can/can't do.
But an opinion without data is just a hypothesis.
So, I tested 3x185 workflow executions for a market researcher agent.
The results have shocked me🧵

I tested three variants:
I. LLM Workflow: No agency, the entire logic carefully orchestrated.
What was expected:
- An LLM workflow was 2x faster (the same model) compared to an AI Agent.
- An LLM workflow consumed 12x less tokens to an AI Agent.
3/185 "errors" are minor formatting results.
I. LLM Workflow: No agency, the entire logic carefully orchestrated.
What was expected:
- An LLM workflow was 2x faster (the same model) compared to an AI Agent.
- An LLM workflow consumed 12x less tokens to an AI Agent.
3/185 "errors" are minor formatting results.

II. Agentic Workflow: Deterministic logic moved to the orchestration layer.
More time, more tokens.
100% task success.
GPT-5 (a reasoning model) consumed less tokens than GPT-4o due to better compression.
None of this was surprising. But then:
More time, more tokens.
100% task success.
GPT-5 (a reasoning model) consumed less tokens than GPT-4o due to better compression.
None of this was surprising. But then:

III. AI Agent: Full autonomy without steps to take, just an objective
I were staring at the screen.
An AI agent without predefined reasoning steps succeeded 185/185 times (100%).
I were staring at the screen.
An AI agent without predefined reasoning steps succeeded 185/185 times (100%).

Conclusions & learnings:
1. For simple use cases, we can already achieve 99%+ reliability
2. A verifier agent with a high TPR would push it even further
3. For complex or critical processes, you still need orchestration
4. Orchestration is faster, cheaper, and more reliable
1. For simple use cases, we can already achieve 99%+ reliability
2. A verifier agent with a high TPR would push it even further
3. For complex or critical processes, you still need orchestration
4. Orchestration is faster, cheaper, and more reliable
@karpathy might be right.
We might need 10 years to achieve true AI intelligence.
But autonomy and reliability for most processes seem more like ~12 months away.
Agree? Disagree?
Let me know in the comments.
P.S....
We might need 10 years to achieve true AI intelligence.
But autonomy and reliability for most processes seem more like ~12 months away.
Agree? Disagree?
Let me know in the comments.
P.S....
A. Free n8n templates I used for testing: productcompass.pm/p/the-ultimate…
B. Enjoy this?
- Follow me @PawelHuryn for deep researched AI & PM
- Share this thread with others
I appreciate it!
- Follow me @PawelHuryn for deep researched AI & PM
- Share this thread with others
I appreciate it!
View Tweet
Generated by Thread Navigator
Press ⌘ + S to quick-export
