Metrics
Understanding agent run and session metrics in Agno
Overview
When you run an agent in Agno, the response you get (RunResponse) includes detailed metrics about the run. These metrics help you understand resource usage (like token usage and time), performance, and other aspects of the model and tool calls.
Metrics are available at multiple levels:
- Per-message: Each message (assistant, tool, etc.) has its own metrics.
- Per-tool call: Each tool execution has its own metrics.
- Aggregated: The
RunResponse
aggregates metrics across all messages in the run.
Where Metrics Live
RunResponse.metrics
: Aggregated metrics for the whole run, as a dictionary.ToolExecution.metrics
: Metrics for each tool call.Message.metrics
: Metrics for each message (assistant, tool, etc.).
Example Usage
Suppose you have an agent that performs some tasks and you want to analyze the metrics after running it. Here’s how you can access and print the metrics:
You run the following code to create an agent and run it with the following configuration:
You’d see the outputs with following information:
Tool Execution Metrics
This section provides metrics for each tool execution. It includes details about the resource usage and performance of individual tool calls.
Message Metrics
Here, you can see the metrics for each message response from the agent. All “assistant” responses will have metrics like this, helping you understand the performance and resource usage at the message level.
Aggregated Run Metrics
The aggregated metrics provide a comprehensive view of the entire run. This includes a summary of all messages and tool calls, giving you an overall picture of the agent’s performance and resource usage.
Similarly for the session metrics, you can see the aggregated metrics across all runs in the session, providing insights into the overall performance and resource usage of the agent across multiple runs.
How Metrics Are Aggregated
- Per-message: Each message (assistant, tool, etc.) has its own metrics object.
- Run-level: RunResponse.metrics is a dictionary where each key (e.g., input_tokens) maps to a list of values from all assistant messages in the run.
- Session-level:
SessionMetrics
(seeagent.session_metrics
) aggregates metrics across all runs in the session.
MessageMetrics
Params
Field | Description |
---|---|
input_tokens | Number of tokens in the prompt/input to the model. |
output_tokens | Number of tokens generated by the model as output. |
total_tokens | Total tokens used (input + output). |
prompt_tokens | Tokens in the prompt (same as input_tokens in the case of OpenAI). |
completion_tokens | Tokens in the completion (same as output_tokens in the case of OpenAI). |
audio_tokens | Total audio tokens (if using audio input/output). |
input_audio_tokens | Audio tokens in the input. |
output_audio_tokens | Audio tokens in the output. |
cached_tokens | Tokens served from cache (if caching is used). |
cache_write_tokens | Tokens written to cache. |
reasoning_tokens | Tokens used for reasoning steps (if enabled). |
prompt_tokens_details | Dict with detailed breakdown of prompt tokens (used by OpenAI). |
completion_tokens_details | Dict with detailed breakdown of completion tokens (used by OpenAI). |
additional_metrics | Any extra metrics provided by the model/tool (e.g., latency, cost, etc.). |
time | Time taken to generate the message (in seconds). |
time_to_first_token | Time until the first token is generated (in seconds). |
Note: Not all fields are always present; it depends on the model/tool and the run.