Improve an agent

This is the part where owning the platform pays off. Your trace data, your agent code, your running platform, and your iteration tool all live in one repo. Claude Code can see every part of it.

Run the prompt

Open Claude Code in the repo and paste:

Run docs/improve-agent.md

Claude will:

Pick an agent. Either the one you just built, or one you point it at.
Pull recent traces. Hits the running AgentOS at http://localhost:8000 via curl and reads runs, sessions, and tool calls.
Identify failure modes. Reads through the traces and flags weak prompts, bad tool selections, missing instructions, hallucinations.
Propose changes. Edits the agent’s instructions, adds or removes tools, tightens the prompt, adds guardrails.
Re-run and verify. Tests the changed agent against the same prompts, compares before/after, asks if you want to keep the changes.

Why this works

Three things have to be true for a coding agent to improve another agent:

Requirement	Where it lives
The trace data	Your Postgres, queryable via the AgentOS API.
The agent code	`agents/*.py` in this repo.
A way to test changes	The running platform, hot-reloads on file change.

When all three live in one place, the loop closes. When they’re split across three SaaS products, it doesn’t.

When to run it

After a batch of real usage (not just your own testing).
Before deploying a major change.
When users report that an agent is missing the point.
Periodically as part of routine maintenance.

What changes, what doesn’t

Claude Code edits agent code in place: instructions, tools, guardrails, model choice. It does not touch:

The platform code (app/, db/).
Other agents not under review.
Your .env or any secrets.

Every change is a regular file edit you can git diff and revert.

Lock in behavior with evals →

Welcome

Tutorials

Runtime

Demo OS

Improve an agent

Run the prompt

Why this works

When to run it

What changes, what doesn’t

Next

Welcome

Tutorials

Runtime

Demo OS

Documentation Index

​Run the prompt

​Why this works

​When to run it

​What changes, what doesn’t

​Next

Run the prompt

Why this works

When to run it

What changes, what doesn’t

Next