> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agno.com/llms.txt
> Use this file to discover all available pages before exploring further.

# What is Evals

> Evals is a way to measure the quality of your Agents and Teams.<br/> Agno provides multiple dimensions for evaluating Agents.

Learn how to evaluate your Agno Agents and Teams across multiple dimensions - **accuracy** (simple correctness checks), **agent as judge** (custom quality criteria), **performance** (runtime and memory), and **reliability** (tool calls).

## Evaluation Dimensions

<CardGroup cols={2}>
  <Card title="Accuracy" icon="bullseye" href="/evals/accuracy/overview">
    The accuracy of the Agent's response using LLM-as-a-judge methodology.
  </Card>

  <Card title="Agent as Judge" icon="scale-balanced" href="/evals/agent-as-judge/overview">
    Evaluate custom quality criteria using LLM-as-a-judge with scoring.
  </Card>

  <Card title="Performance" icon="stopwatch" href="/evals/performance/overview">
    The performance of the Agent's response, including latency and memory footprint.
  </Card>

  <Card title="Reliability" icon="shield-check" href="/evals/reliability/overview">
    The reliability of the Agent's response, including tool calls and error handling.
  </Card>
</CardGroup>

## Quick Start

Here's a simple example of running an accuracy evaluation:

```python quick_eval.py theme={null}
from typing import Optional
from agno.agent import Agent
from agno.eval.accuracy import AccuracyEval, AccuracyResult
from agno.models.openai import OpenAIResponses
from agno.tools.calculator import CalculatorTools

# Create an evaluation
evaluation = AccuracyEval(
    model=OpenAIResponses(id="gpt-5.2"),
    agent=Agent(model=OpenAIResponses(id="gpt-5.2"), tools=[CalculatorTools()]),
    input="What is 10*5 then to the power of 2? do it step by step",
    expected_output="2500",
    additional_guidelines="Agent output should include the steps and the final answer.",
)

# Run the evaluation
result: Optional[AccuracyResult] = evaluation.run(print_results=True)
```

## Best Practices

* **Start Simple:** Begin with basic accuracy tests before progressing to complex performance and reliability evaluations
* **Use Multiple Test Cases:** Don't rely on a single test case—build comprehensive test suites that cover edge cases
* **Track Over Time:** Monitor your eval metrics continuously as you iterate on your agents
* **Combine Dimensions:** Evaluate across all three dimensions for a holistic view of agent quality

## Guides

Dive deeper into each evaluation dimension:

1. **[Accuracy Evals](/evals/accuracy/overview)** - Learn LLM-as-a-judge techniques and multiple test case strategies
2. **[Agent as Judge Evals](/evals/agent-as-judge/overview)** - Define custom quality criteria with flexible scoring strategies
3. **[Performance Evals](/evals/performance/overview)** - Measure latency, memory usage, and compare different configurations
4. **[Reliability Evals](/evals/reliability/overview)** - Test tool calls, error handling, and rate limiting behavior
