Visualize Thread by @PawelHuryn

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Paweł Huryn

@PawelHuryn

After an interview with @karpathy, everyone is talking about what AI agents can/can't do.

But an opinion without data is just a hypothesis.

So, I tested 3x185 workflow executions for a market researcher agent.

The results have shocked me🧵

Paweł Huryn

@PawelHuryn

I tested three variants:

I. LLM Workflow: No agency, the entire logic carefully orchestrated.

What was expected:
- An LLM workflow was 2x faster (the same model) compared to an AI Agent.
- An LLM workflow consumed 12x less tokens to an AI Agent.

3/185 "errors" are minor formatting results.

Paweł Huryn

@PawelHuryn

II. Agentic Workflow: Deterministic logic moved to the orchestration layer.

More time, more tokens.
100% task success.

GPT-5 (a reasoning model) consumed less tokens than GPT-4o due to better compression.

None of this was surprising. But then:

Paweł Huryn

@PawelHuryn

III. AI Agent: Full autonomy without steps to take, just an objective

I were staring at the screen.

An AI agent without predefined reasoning steps succeeded 185/185 times (100%).

Paweł Huryn

@PawelHuryn

This is different from my previous observations for the same models:

View Tweet

Paweł Huryn

@PawelHuryn

Conclusions & learnings:

1. For simple use cases, we can already achieve 99%+ reliability
2. A verifier agent with a high TPR would push it even further
3. For complex or critical processes, you still need orchestration
4. Orchestration is faster, cheaper, and more reliable

Paweł Huryn

@PawelHuryn

@karpathy might be right.

We might need 10 years to achieve true AI intelligence.

But autonomy and reliability for most processes seem more like ~12 months away.

Agree? Disagree?

Let me know in the comments.

P.S....

Paweł Huryn

@PawelHuryn

A. Free n8n templates I used for testing: productcompass.pm/p/the-ultimate…

Paweł Huryn

@PawelHuryn

B. Enjoy this?

- Follow me @PawelHuryn for deep researched AI & PM
- Share this thread with others

I appreciate it!

View Tweet

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export