✨ Visual Editor

close

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
elvis
@omarsar0
Open-Ended Evolution of Self-Improving Agents

Can AI systems endlessly improve themselves?

This work shows the potential of self-improving AI, inspired by biological evolution and open-ended exploration.

This is a must-read!

Here are my notes:
Thread image
elvis
@omarsar0
What's the high level?

This work presents the Darwin Gödel Machine (DGM), a system that advances the vision of self-improving AI by combining self-referential code modification with open-ended evolutionary search...
elvis
@omarsar0
Unlike the original Gödel machine, which requires provable benefits for code changes (a practically intractable constraint), the DGM adopts an empirical approach: it modifies its own codebase and evaluates improvements on coding benchmarks.
elvis
@omarsar0
Self-referential self-improvement loop

The DGM starts with a single coding agent that edits its own Python-based codebase to improve its ability to read, write, and execute code using frozen foundation models (FMs).

Each modification is evaluated on benchmarks like SWE-bench and Polyglot, with only successful agents retained for further iterations.
Thread image
elvis
@omarsar0
Recursive Bootstrapping Without Meta-Agent

Unlike other approaches (e.g., ADAS) that separate the meta-agent from the worker agent, the DGM is monolithic and self-contained; it improves itself using itself.

Implication: This removes a key bottleneck in scaling agentic systems, suggesting that future agents may not need human-designed meta-learning infrastructure at all, potentially accelerating autonomous innovation.
elvis
@omarsar0
Open-ended exploration via evolutionary archive

Inspired by Darwinian evolution, the system maintains an archive of all prior agents and samples parents based on performance and novelty.

This enables exploration beyond local optima and supports continual innovation, including revisiting previously suboptimal variants that become valuable stepping stones later.

This mirrors how scientific ideas or mutations in nature may have value only many steps down the line, making it crucial for future AI systems to incorporate curiosity, diversity, and memory in search strategies.
elvis
@omarsar0
Empirical performance gains

Across 80 iterations, DGM boosts coding success on SWE-bench from 20.0% to 50.0% and on Polyglot from 14.2% to 30.7%, outperforming strong baselines that lack either self-improvement or open-endedness.

Its best agents match or exceed leading human-designed, open-source coding agents.
Thread image
elvis
@omarsar0
Emergent tool and workflow improvements

Through self-improvement, DGM enhances its capabilities by evolving more granular editing tools, retry and evaluation mechanisms, history-aware patch generation, and code summarization for long contexts.

DGM offers a glimpse into how future AI systems might invent their own software development practices, potentially surpassing current human conventions.
Thread image
elvis
@omarsar0
Generalization across models and tasks

Agents discovered by DGM generalize well when transferred across foundation models (e.g., Claude 3.5 to 3.7, o3-mini) and programming languages, demonstrating robust improvements not overfit to a particular setup.
Thread image
elvis
@omarsar0
Safety-conscious design

All experiments were sandboxed, monitored, and scoped to confined domains.

The paper also discusses how future self-improving AI systems could evolve safer, more interpretable behaviors if these traits are part of the evaluation criteria.

The code has also been open-sourced.

Code: github.com/jennyzzt/dgm
Paper: arxiv.org/abs/2505.22954
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export