Visualize Thread by @rohanpaul_ai

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Rohan Paul

@rohanpaul_ai

The paper shows how an LLM agent keeps improving by learning from its own memory, without changing the base model.

It ranks top on GAIA validation at 87.88% Pass@3, with 79.40% on the private test.

Most agent systems either rely on fixed workflows that never adapt, or burn compute to fine tune model weights.

AgentFly stores each solved attempt as a case in episodic memory, then picks similar cases to guide the next plan.

They cast it as a memory augmented decision process, where a learned retrieval policy scores which past cases to reuse.

That policy learns online from task rewards, using either simple similarity or a small neural scorer, so case choice keeps improving.

A planner proposes subtasks with those cases, an executor runs tools via the Model Context Protocol, and case, subtask, and tool memories track progress.

Because only memory and the retrieval policy update, the base LLM stays frozen, cost stays low, and the agent adapts continuously.

Across research and question answering, the case memory lifts out of distribution accuracy by +4.7% to +9.6%, and hits 95.0% on SimpleQA.

The takeaway is practical, teach the agent which past experiences matter and it will plan better without fiddling with weights.

----

Paper – arxiv. org/abs/2508.16153

Paper Title: "AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs"

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export