@rohanpaul_ai: The paper shows how an LLM age...

45 views Aug 26, 2025

The paper shows how an LLM agent keeps improving by learning from its own memory, without changing the base model.

It ranks top on GAIA validation at 87.88% Pass@3, with 79.40% on the private test.

Most agent systems either rely on fixed workflows that never adapt, or burn compute to fine tune model weights.

AgentFly stores each solved attempt as a case in episodic memory, then picks similar cases to guide the next plan.

They cast it as a memory augmented decision process, where a learned retrieval policy scores which past cases to reuse.

That policy learns online from task rewards, using either simple similarity or a small neural scorer, so case choice keeps improving.

A planner proposes subtasks with those cases, an executor runs tools via the Model Context Protocol, and case, subtask, and tool memories track progress.

Because only memory and the retrieval policy update, the base LLM stays frozen, cost stays low, and the agent adapts continuously.

Across research and question answering, the case memory lifts out of distribution accuracy by +4.7% to +9.6%, and hits 95.0% on SimpleQA.

The takeaway is practical, teach the agent which past experiences matter and it will plan better without fiddling with weights.

----

Paper – arxiv. org/abs/2508.16153

Paper Title: "AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs"

@rohanpaul_ai: The paper shows how an LLM age...

Actions

What You Can Do