Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Fine-tuning LLM Agents without Fine-tuning LLMs Catchy title and very cool memory technique to improve deep research agents. Great for continuous, real-time learning without gradient updates. Here are my notes:


Overview Proposes a memory‑based learning framework that lets deep‑research agents adapt online without updating model weights. The agent is cast as a memory‑augmented MDP with case‑based reasoning, implemented in a planner–executor loop over MCP tools.


Method Decisions are guided by a learned case‑retrieval policy over an episodic Case Bank. Non‑parametric memory retrieves Top‑K similar cases; parametric memory learns a Q‑function (soft Q‑learning or single‑step CE training in deep‑research settings) to rank cases for reuse and revision.

Architecture Planner (LLM CBR) + Executor (LLM MCP client) with three memories: Case, Subtask, Tool. It involves planning, tool execution, writing/reading of cases, and a replay buffer. Tools span search, crawl, multimodal document parsing, code execution, and math utilities.


Results: • GAIA: 87.88% Pass@3 on validation and 79.40% on test, competitive with or above open‑source agent frameworks • DeepResearcher: 66.6 F1 and 80.4 PM average across seven open‑domain QA sets • SimpleQA: 95.0% accuracy, beating recent web‑agent baselines • HLE: 24.4 PM, close to GPT‑5 and ahead of several strong baselines


Practical takeaways for agent builders: • Use a compact, curated case memory with adaptive retrieval rather than growing prompts. • Keep planning concise. A fast planner outperforms slow‑think planners for multi‑step tool use on GAIA by avoiding verbose or shortcut plans. • Separate planning and execution with explicit Subtask and Tool memories to coordinate long‑horizon work and reduce hallucinations Paper: <a target="_blank" href="https://arxiv.org/abs/2508.16153" color="blue">arxiv.org/abs/2508.16153</a>
