Binary Agent as Judge

This example demonstrates binary PASS/FAIL evaluation mode without numeric scoring.

Add the following code to your Python file

agent_as_judge_binary.py

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.eval.agent_as_judge import AgentAsJudgeEval
from agno.models.openai import OpenAIChat

# Setup database to persist eval results
db = SqliteDb(db_file="tmp/agent_as_judge_binary.db")

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    instructions="You are a customer service agent. Respond professionally.",
    db=db,
)

response = agent.run("I need help with my account")

evaluation = AgentAsJudgeEval(
    name="Professional Tone Check",
    criteria="Response must maintain professional tone without informal language or slang",
    db=db,
)

result = evaluation.run(
    input="I need help with my account",
    output=str(response.content),
    print_results=True,
    print_summary=True,
)

print(f"Result: {'PASSED' if result.results[0].passed else 'FAILED'}")

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv .venv
source .venv/bin/activate

Install libraries

pip install -U agno openai

Export your OpenAI API key

  export OPENAI_API_KEY="your_openai_api_key_here"

Run the example

python agent_as_judge_binary.py

Async Agent as Judge Batch Agent as Judge

⌘I

Get Started

Basics

Context Management

Execution Control

Additional Features

Integrations

Help

Binary Agent as Judge