✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Rohan Paul
@rohanpaul_ai
The paper shows how an LLM agent keeps improving by learning from its own memory, without changing the base model.

It ranks top on GAIA validation at 87.88% Pass@3, with 79.40% on the private test.

Most agent systems either rely on fixed workflows that never adapt, or burn compute to fine tune model weights.

AgentFly stores each solved attempt as a case in episodic memory, then picks similar cases to guide the next plan.

They cast it as a memory augmented decision process, where a learned retrieval policy scores which past cases to reuse.

That policy learns online from task rewards, using either simple similarity or a small neural scorer, so case choice keeps improving.

A planner proposes subtasks with those cases, an executor runs tools via the Model Context Protocol, and case, subtask, and tool memories track progress.

Because only memory and the retrieval policy update, the base LLM stays frozen, cost stays low, and the agent adapts continuously.

Across research and question answering, the case memory lifts out of distribution accuracy by +4.7% to +9.6%, and hits 95.0% on SimpleQA.

The takeaway is practical, teach the agent which past experiences matter and it will plan better without fiddling with weights.

----

Paper – arxiv. org/abs/2508.16153

Paper Title: "AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs"
Thread image
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export