Skip to main content
This example demonstrates binary PASS/FAIL evaluation mode without numeric scoring.
1

Add the following code to your Python file

agent_as_judge_binary.py
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.eval.agent_as_judge import AgentAsJudgeEval
from agno.models.openai import OpenAIChat

# Setup database to persist eval results
db = SqliteDb(db_file="tmp/agent_as_judge_binary.db")

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    instructions="You are a customer service agent. Respond professionally.",
    db=db,
)

response = agent.run("I need help with my account")

evaluation = AgentAsJudgeEval(
    name="Professional Tone Check",
    criteria="Response must maintain professional tone without informal language or slang",
    db=db,
)

result = evaluation.run(
    input="I need help with my account",
    output=str(response.content),
    print_results=True,
    print_summary=True,
)

print(f"Result: {'PASSED' if result.results[0].passed else 'FAILED'}")

2

Create a virtual environment

Open the Terminal and create a python virtual environment.
python3 -m venv .venv
source .venv/bin/activate
3

Install libraries

pip install -U agno openai
4

Export your OpenAI API key

  export OPENAI_API_KEY="your_openai_api_key_here"
5

Run the example

python agent_as_judge_binary.py