> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agno.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Background Output Evaluation

> Use Agent as Judge evaluation to assess responses as a background task

This example demonstrates how to use Agent as Judge evaluation to assess the main agent's output as a background task. Unlike blocking validation, background evaluation:

* Does NOT block the response to the user
* Logs evaluation results for monitoring and analytics
* Can trigger alerts or store metrics without affecting latency

**Use cases:**

* Quality monitoring in production
* Compliance auditing
* Validating hallucinations or other inappropriate content

<Steps>
  <Step title="Create a Python file">
    ```python background_output_evaluation.py theme={null}
    from agno.agent import Agent
    from agno.db.sqlite import AsyncSqliteDb
    from agno.eval.agent_as_judge import AgentAsJudgeEval
    from agno.models.openai import OpenAIResponses
    from agno.os import AgentOS

    # Setup database for agent and evaluation storage
    db = AsyncSqliteDb(db_file="tmp/evaluation.db")

    # Create the evaluator using Agent as Judge
    evaluator = AgentAsJudgeEval(
        db=db,
        name="Response Quality Check",
        model=OpenAIResponses(id="gpt-5.2"),
        criteria="Response should be helpful, accurate, and well-structured",
        additional_guidelines=[
            "Evaluate if the response addresses the user's question directly",
            "Check if the information provided is correct and reliable",
            "Assess if the response is well-organized and easy to understand",
        ],
        threshold=7,
        run_in_background=True,  # Runs evaluation without blocking the response
    )

    # Create the main agent with Agent as Judge evaluation
    main_agent = Agent(
        id="support-agent",
        name="CustomerSupportAgent",
        model=OpenAIResponses(id="gpt-5.2"),
        instructions=[
            "You are a helpful customer support agent.",
            "Provide clear, accurate, and friendly responses.",
            "If you don't know something, say so honestly.",
        ],
        db=db,
        post_hooks=[evaluator],  # Automatically evaluates each response
        markdown=True,
    )

    # Create AgentOS
    agent_os = AgentOS(agents=[main_agent])
    app = agent_os.get_app()


    if __name__ == "__main__":
        agent_os.serve(app="background_output_evaluation:app", port=7777, reload=True)
    ```
  </Step>

  <Snippet file="create-venv-step.mdx" />

  <Step title="Install dependencies">
    ```bash theme={null}
    uv pip install -U agno openai uvicorn
    ```
  </Step>

  <Step title="Export your OpenAI API key">
    <CodeGroup>
      ```bash Mac/Linux theme={null}
      export OPENAI_API_KEY="your_openai_api_key_here"
      ```

      ```bash Windows theme={null}
      $Env:OPENAI_API_KEY="your_openai_api_key_here"
      ```
    </CodeGroup>
  </Step>

  <Step title="Run the server">
    <CodeGroup>
      ```bash Mac/Linux theme={null}
      python background_output_evaluation.py
      ```

      ```bash Windows theme={null}
      python background_output_evaluation.py
      ```
    </CodeGroup>
  </Step>

  <Step title="Test the endpoint">
    ```bash theme={null}
    curl -X POST http://localhost:7777/agents/support-agent/runs \
      -F "message=How do I reset my password?" \
      -F "stream=false"
    ```

    The response will be returned immediately. The evaluation runs in the background and results are stored in the database.
  </Step>
</Steps>

## What Happens

1. User sends a request to the agent
2. The agent processes and generates a response
3. The response is sent to the user **immediately**
4. Background evaluation runs:
   * `AgentAsJudgeEval` automatically evaluates the response against the criteria
   * Scores the response on a scale of 1-10
   * Stores results in the database

### Production Extensions

In production, you could extend this pattern to:

| Extension            | Description                                                 |
| -------------------- | ----------------------------------------------------------- |
| **Database Storage** | Store evaluations for analytics dashboards                  |
| **Alerting**         | Use `on_fail` callback to send alerts when evaluations fail |
| **Observability**    | Log to platforms like Datadog or OpenTelemetry              |
| **A/B Testing**      | Compare response quality across model versions              |
| **Training Data**    | Build datasets for fine-tuning                              |

<Tip>
  Background evaluation is ideal for quality monitoring without impacting user experience. For scenarios where you need to block bad responses, use synchronous hooks instead.
</Tip>

## Related Examples

<CardGroup cols={2}>
  <Card title="Global Background Hooks" icon="gear" href="/agent-os/usage/background-hooks-global">
    Run all hooks as background tasks
  </Card>

  <Card title="Per-Hook Background" icon="code" href="/agent-os/usage/background-hooks-decorator">
    Mix synchronous and background hooks
  </Card>
</CardGroup>
