✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
elvis
@omarsar0
Fine-tuning LLM Agents without Fine-tuning LLMs

Catchy title and very cool memory technique to improve deep research agents.

Great for continuous, real-time learning without gradient updates.

Here are my notes:
Thread image
elvis
@omarsar0
Overview

Proposes a memory‑based learning framework that lets deep‑research agents adapt online without updating model weights.

The agent is cast as a memory‑augmented MDP with case‑based reasoning, implemented in a planner–executor loop over MCP tools.
Thread image
elvis
@omarsar0
Method

Decisions are guided by a learned case‑retrieval policy over an episodic Case Bank.

Non‑parametric memory retrieves Top‑K similar cases; parametric memory learns a Q‑function (soft Q‑learning or single‑step CE training in deep‑research settings) to rank cases for reuse and revision.
elvis
@omarsar0
Architecture

Planner (LLM CBR) + Executor (LLM MCP client) with three memories: Case, Subtask, Tool.

It involves planning, tool execution, writing/reading of cases, and a replay buffer. Tools span search, crawl, multimodal document parsing, code execution, and math utilities.
Thread image
elvis
@omarsar0
Results:

• GAIA: 87.88% Pass@3 on validation and 79.40% on test, competitive with or above open‑source agent frameworks
• DeepResearcher: 66.6 F1 and 80.4 PM average across seven open‑domain QA sets
• SimpleQA: 95.0% accuracy, beating recent web‑agent baselines
• HLE: 24.4 PM, close to GPT‑5 and ahead of several strong baselines
Thread image
elvis
@omarsar0
Practical takeaways for agent builders:

• Use a compact, curated case memory with adaptive retrieval rather than growing prompts.

• Keep planning concise. A fast planner outperforms slow‑think planners for multi‑step tool use on GAIA by avoiding verbose or shortcut plans.

• Separate planning and execution with explicit Subtask and Tool memories to coordinate long‑horizon work and reduce hallucinations

Paper: arxiv.org/abs/2508.16153
Thread image
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export