Skip to main content
You have an agent, AgentOS running it, and a Postgres holding state. What’s left to ship a product? Less than you think. Most of the gap between “AgentOS works” and “users can pay” is product work, not infrastructure. This page covers two things the rest of the section doesn’t: picking your starting point, and the operating loop that keeps your agent working once real users hit it.

Pick the right starting point

Don’t build AgentOS from scratch. Start from a template that’s closest to what you want to ship.
TemplateRight when
ScoutYou’re building a context agent that pulls from external systems (Slack, Drive, MCP, custom)
DashYou’re building a data agent that answers questions from your database
CodaYou’re building a code companion that lives in Slack
Demo OSYou want a kitchen-sink reference with every feature wired up
Bare AgentOSYou want full control and don’t mind wiring it yourself
All four templates ship with Docker Compose, Railway scripts, Slack manifest, JWT setup, and a .env.production flow. You’re 90% of the way to deploy on day one.

Replace the demo data

Templates ship with synthetic data so the first run works. Replace it with your own:
TemplateSwap
ScoutConfigure context providers in scout/contexts.py. Point at your S3 buckets, your Drive folders, your MCP servers.
DashReplace scripts/generate_data.py with a loader for your dataset. Then rewrite knowledge/ for your tables.
CodaEdit repos.yaml to point at your repos. Make sure your GITHUB_ACCESS_TOKEN has the right scopes.
Demo OSFork the agent that’s closest to yours. Modify its instructions, knowledge, tools. Remove the agents you don’t need.
The goal is a single agent that does one useful thing on real data. Then iterate.

The shipping checklist

Each row maps to a page in this section. Don’t re-read them here; this is the punch list.
StepDoWhere
Pick interfacesSlack for B2B, Telegram for personal, WhatsApp for support, AG-UI for browser, custom HTTP for everything else. Start with one.Interfaces
Wire authRUNTIME_ENV=prd and JWT_VERIFICATION_KEY. Control plane issues tokens, your service verifies.Security & Auth
Turn on tracingtracing=True from day one. The first time a user reports a bad answer, you’ll have the trace tree to debug from.Observability
Gate irreversible actionsrequires_confirmation=True for user approval, @approval for admin approval. Don’t add approval everywhere — friction kills adoption.Human Approval
Schedule proactive workRegister recurring jobs in the app lifespan so they survive restarts.Scheduling
DeployRailway via template scripts, AWS/GCP/Azure via container + managed Postgres + secrets, or self-hosted Docker Compose.Deploy

Iterate from real usage

The first version is wrong about things, and that’s expected. The interesting question is how fast can you find what’s wrong and fix it. The loop:
  1. A user reports a bad answer (Slack, support ticket, your own dogfooding).
  2. Find the run. Filter agno_sessions by user_id and created_at to narrow it down. The session ID gives you the full thread.
  3. Pull the trace from agno_traces and agno_spans. Look at what tools were called, what the model saw, what came back.
  4. Replay it locally. Run the same input through the same agent in a script. Reproduce.
  5. Patch the prompt, swap the tool, add a learning, fix the knowledge. Re-run.
  6. Add the case to your eval suite so the regression can’t come back.
  7. Ship.
Steps 1-3 should take five minutes once you know your data model. The signals that matter:
SignalWhere
Wrong answers users complained aboutagno_sessions joined with the feedback table you build
Tools failing in productionagno_spans filtered by status='error'
Slow runsagno_spans end_time - start_time per agent
Cost spikesagno_sessions.total_tokens grouped by agent_id and day
Regression after a changeRun evals before deploying

Keep the agent learning

The agents that get better over time have feedback loops baked in. The agents that get worse are the ones nobody is watching. Three patterns the templates use: LearningMachine. Agno’s built-in pattern for agents that store discovered facts and retrieve them on the next run. When the agent figures out that the revenue_v2 table replaced revenue last March, it writes that to a learnings table; the next time someone asks a revenue question, that learning is in the prompt. Dash uses this end-to-end. See Learning. Knowledge updates as a feedback loop. When you find a wrong-answer pattern, add the right answer to knowledge. The user who asked the question gets a fix; every subsequent user gets the right answer the first time. Pal and Scout both lean on this. Eval-gated deploys. Every deploy runs an eval suite. Regressions block the merge. The eval suite grows with the bugs you find — every postmortem ends with a new eval case. Over time the suite becomes the institutional memory of what your agent should and shouldn’t do.

What’s left

You shipped. The rest is normal product work that AgentOS doesn’t try to solve:
ConcernWhere
Pricing, billing, rate limitsA layer in front of AgentOS. Stripe, your own metering, your own limits.
Multi-tenant isolationPer-tenant db instances or schema-per-tenant within one db
Compliance (SOC 2, HIPAA, etc.)RBAC + audit logs + data residency choices in your storage backend
Custom UIsAG-UI for the chat surface; build the rest as a normal web app calling AgentOS
AgentOS gives you the runtime. The product is yours to ship.

See it work end-to-end

TemplateProduct
Scout tutorialEnterprise context agent over S3 + Slack + Drive + MCP
Dash tutorialSelf-learning data agent over a SaaS metrics dataset
Coda tutorialCode companion that lives in Slack and triages issues
Pick the one closest to your product and follow it end-to-end. Then swap the data and ship.