@omarsar0: Fine-tuning LLM Agents without...
@omarsar0
14 views
Aug 26, 2025
3
Method
Decisions are guided by a learned caseâretrieval policy over an episodic Case Bank.
Nonâparametric memory retrieves TopâK similar cases; parametric memory learns a Qâfunction (soft Qâlearning or singleâstep CE training in deepâresearch settings) to rank cases for reuse and revision.
Decisions are guided by a learned caseâretrieval policy over an episodic Case Bank.
Nonâparametric memory retrieves TopâK similar cases; parametric memory learns a Qâfunction (soft Qâlearning or singleâstep CE training in deepâresearch settings) to rank cases for reuse and revision.
5
Results:
⢠GAIA: 87.88% Pass@3 on validation and 79.40% on test, competitive with or above openâsource agent frameworks
⢠DeepResearcher: 66.6 F1 and 80.4 PM average across seven openâdomain QA sets
⢠SimpleQA: 95.0% accuracy, beating recent webâagent baselines
⢠HLE: 24.4 PM, close to GPTâ5 and ahead of several strong baselines
⢠GAIA: 87.88% Pass@3 on validation and 79.40% on test, competitive with or above openâsource agent frameworks
⢠DeepResearcher: 66.6 F1 and 80.4 PM average across seven openâdomain QA sets
⢠SimpleQA: 95.0% accuracy, beating recent webâagent baselines
⢠HLE: 24.4 PM, close to GPTâ5 and ahead of several strong baselines
6
Practical takeaways for agent builders:
⢠Use a compact, curated case memory with adaptive retrieval rather than growing prompts.
⢠Keep planning concise. A fast planner outperforms slowâthink planners for multiâstep tool use on GAIA by avoiding verbose or shortcut plans.
⢠Separate planning and execution with explicit Subtask and Tool memories to coordinate longâhorizon work and reduce hallucinations
Paper: arxiv.org/abs/2508.16153
⢠Use a compact, curated case memory with adaptive retrieval rather than growing prompts.
⢠Keep planning concise. A fast planner outperforms slowâthink planners for multiâstep tool use on GAIA by avoiding verbose or shortcut plans.
⢠Separate planning and execution with explicit Subtask and Tool memories to coordinate longâhorizon work and reduce hallucinations
Paper: arxiv.org/abs/2508.16153




