Evals - Agno

Example	Description
Accuracy	Accuracy examples evaluate how well responses match expected outputs.
Agent As Judge	Agent-as-judge examples evaluate output quality with model-based scoring.
Performance	Performance examples benchmark runtime and memory impact for agents and teams.
Reliability	Reliability examples validate whether expected tool calls are made correctly.

⌘I