Binary Agent as Judge

This example demonstrates binary PASS/FAIL evaluation mode without numeric scoring.

Add the following code to your Python file

agent_as_judge_binary.py

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.eval.agent_as_judge import AgentAsJudgeEval
from agno.models.openai import OpenAIResponses

# Setup database to persist eval results
db = SqliteDb(db_file="tmp/agent_as_judge_binary.db")

agent = Agent(
    model=OpenAIResponses(id="gpt-5.2"),
    instructions="You are a customer service agent. Respond professionally.",
    db=db,
)

response = agent.run("I need help with my account")

evaluation = AgentAsJudgeEval(
    name="Professional Tone Check",
    criteria="Response must maintain professional tone without informal language or slang",
    db=db,
)

result = evaluation.run(
    input="I need help with my account",
    output=str(response.content),
    print_results=True,
    print_summary=True,
)

print(f"Result: {'PASSED' if result.results[0].passed else 'FAILED'}")

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Install dependencies

uv pip install -U agno openai

Export your OpenAI API key

  export OPENAI_API_KEY="your_openai_api_key_here"

Run the example

python agent_as_judge_binary.py

Get Started

Basics

Advanced

Other

Binary Agent as Judge