@omarsar0: Overview of Self-Evolving Agen...
@omarsar0
30 views
Aug 31, 2025
2
This survey defines self-evolving AI agents and argues for a shift from static, hand-crafted systems to lifelong, adaptive agentic ecosystems.
It maps the field’s trajectory, proposes “Three Laws” to keep evolution safe and useful, and organizes techniques across single-agent, multi-agent, and domain-specific settings.
It maps the field’s trajectory, proposes “Three Laws” to keep evolution safe and useful, and organizes techniques across single-agent, multi-agent, and domain-specific settings.
4
LLM-centric learning paradigms:
MOP (Model Offline Pretraining): Static pretraining on large corpora; no adaptation after deployment.
MOA (Model Online Adaptation): Post-deployment updates via fine-tuning, adapters, or RLHF.
MAO (Multi-Agent Orchestration): Multiple agents coordinate through message exchange or debate, without changing model weights.
MASE (Multi-Agent Self-Evolving): Agents interact with their environment, continually optimising prompts, memory, tools, and workflows.
MOP (Model Offline Pretraining): Static pretraining on large corpora; no adaptation after deployment.
MOA (Model Online Adaptation): Post-deployment updates via fine-tuning, adapters, or RLHF.
MAO (Multi-Agent Orchestration): Multiple agents coordinate through message exchange or debate, without changing model weights.
MASE (Multi-Agent Self-Evolving): Agents interact with their environment, continually optimising prompts, memory, tools, and workflows.
7
Single-agent optimization toolbox
Techniques are grouped into:
(i) LLM behavior (training for reasoning; test-time scaling with search and verification),
(ii) prompt optimization (edit, generate, text-gradient, evolutionary),
(iii) memory optimization (short-term compression and retrieval; long-term RAG, graphs, and control policies), and
(iv) tool use and tool creation.
Techniques are grouped into:
(i) LLM behavior (training for reasoning; test-time scaling with search and verification),
(ii) prompt optimization (edit, generate, text-gradient, evolutionary),
(iii) memory optimization (short-term compression and retrieval; long-term RAG, graphs, and control policies), and
(iv) tool use and tool creation.
9
Multi-agent workflows that self-improve
Beyond manual pipelines, the survey treats prompts, topologies, and backbones as searchable spaces.
It distinguishes code-level workflows and communication-graph topologies, covers unified optimization that jointly tunes prompts and structure, and describes backbone training for better cooperation.
Beyond manual pipelines, the survey treats prompts, topologies, and backbones as searchable spaces.
It distinguishes code-level workflows and communication-graph topologies, covers unified optimization that jointly tunes prompts and structure, and describes backbone training for better cooperation.
10
Evaluation, safety, and open problems
Benchmarks span tools, web navigation, GUI agents, collaboration, and specialized domains; LLM-as-judge and Agent-as-judge reduce evaluation cost while tracking process quality.
The paper stresses continuous, evolution-aware safety monitoring and highlights challenges such as stable reward modeling, efficiency-effectiveness trade-offs, and transfer of optimized prompts/topologies to new models or domains.
Paper: arxiv.org/abs/2508.07407
Benchmarks span tools, web navigation, GUI agents, collaboration, and specialized domains; LLM-as-judge and Agent-as-judge reduce evaluation cost while tracking process quality.
The paper stresses continuous, evolution-aware safety monitoring and highlights challenges such as stable reward modeling, efficiency-effectiveness trade-offs, and transfer of optimized prompts/topologies to new models or domains.
Paper: arxiv.org/abs/2508.07407








