Visualize Thread by @omarsar0

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

elvis

@omarsar0

Fine-tuning LLM Agents without Fine-tuning LLMs

Catchy title and very cool memory technique to improve deep research agents.

Great for continuous, real-time learning without gradient updates.

Here are my notes:

elvis

@omarsar0

Overview

Proposes a memory‑based learning framework that lets deep‑research agents adapt online without updating model weights.

The agent is cast as a memory‑augmented MDP with case‑based reasoning, implemented in a planner–executor loop over MCP tools.

elvis

@omarsar0

Method

Decisions are guided by a learned case‑retrieval policy over an episodic Case Bank.

Non‑parametric memory retrieves Top‑K similar cases; parametric memory learns a Q‑function (soft Q‑learning or single‑step CE training in deep‑research settings) to rank cases for reuse and revision.

elvis

@omarsar0

Architecture

Planner (LLM CBR) + Executor (LLM MCP client) with three memories: Case, Subtask, Tool.

It involves planning, tool execution, writing/reading of cases, and a replay buffer. Tools span search, crawl, multimodal document parsing, code execution, and math utilities.

elvis

@omarsar0

Results:

• GAIA: 87.88% Pass@3 on validation and 79.40% on test, competitive with or above open‑source agent frameworks
• DeepResearcher: 66.6 F1 and 80.4 PM average across seven open‑domain QA sets
• SimpleQA: 95.0% accuracy, beating recent web‑agent baselines
• HLE: 24.4 PM, close to GPT‑5 and ahead of several strong baselines

elvis

@omarsar0

Practical takeaways for agent builders:

• Use a compact, curated case memory with adaptive retrieval rather than growing prompts.

• Keep planning concise. A fast planner outperforms slow‑think planners for multi‑step tool use on GAIA by avoiding verbose or shortcut plans.

• Separate planning and execution with explicit Subtask and Tool memories to coordinate long‑horizon work and reduce hallucinations

Paper: arxiv.org/abs/2508.16153

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export