Use Agent as Judge evaluation to assess responses as a background task
This example demonstrates how to use Agent as Judge evaluation to assess the main agent’s output as a background task. Unlike blocking validation, background evaluation:
Does NOT block the response to the user
Logs evaluation results for monitoring and analytics
Can trigger alerts or store metrics without affecting latency
Use cases:
Quality monitoring in production
Compliance auditing
Validating hallucinations or other inappropriate content
1
Create a Python file
background_output_evaluation.py
Copy
Ask AI
from agno.agent import Agentfrom agno.db.sqlite import AsyncSqliteDbfrom agno.eval.agent_as_judge import AgentAsJudgeEvalfrom agno.models.openai import OpenAIResponsesfrom agno.os import AgentOS# Setup database for agent and evaluation storagedb = AsyncSqliteDb(db_file="tmp/evaluation.db")# Create the evaluator using Agent as Judgeevaluator = AgentAsJudgeEval( db=db, name="Response Quality Check", model=OpenAIResponses(id="gpt-5.2"), criteria="Response should be helpful, accurate, and well-structured", additional_guidelines=[ "Evaluate if the response addresses the user's question directly", "Check if the information provided is correct and reliable", "Assess if the response is well-organized and easy to understand", ], threshold=7, run_in_background=True, # Runs evaluation without blocking the response)# Create the main agent with Agent as Judge evaluationmain_agent = Agent( id="support-agent", name="CustomerSupportAgent", model=OpenAIResponses(id="gpt-5.2"), instructions=[ "You are a helpful customer support agent.", "Provide clear, accurate, and friendly responses.", "If you don't know something, say so honestly.", ], db=db, post_hooks=[evaluator], # Automatically evaluates each response markdown=True,)# Create AgentOSagent_os = AgentOS(agents=[main_agent])app = agent_os.get_app()if __name__ == "__main__": agent_os.serve(app="background_output_evaluation:app", port=7777, reload=True)
2
Set up your virtual environment
Copy
Ask AI
uv venv --python 3.12source .venv/bin/activate
3
Install dependencies
Copy
Ask AI
uv pip install -U agno openai uvicorn
4
Export your OpenAI API key
Copy
Ask AI
export OPENAI_API_KEY="your_openai_api_key_here"
5
Run the server
Copy
Ask AI
python background_output_evaluation.py
6
Test the endpoint
Copy
Ask AI
curl -X POST http://localhost:7777/agents/support-agent/runs \ -F "message=How do I reset my password?" \ -F "stream=false"
The response will be returned immediately. The evaluation runs in the background and results are stored in the database.
Use on_fail callback to send alerts when evaluations fail
Observability
Log to platforms like Datadog or OpenTelemetry
A/B Testing
Compare response quality across model versions
Training Data
Build datasets for fine-tuning
Background evaluation is ideal for quality monitoring without impacting user experience. For scenarios where you need to block bad responses, use synchronous hooks instead.