List Evaluation Runs

{
  "data": [
    {
      "id": "a03fa2f4-900d-482d-afe0-470d4cd8d1f4",
      "agent_id": "basic-agent",
      "model_id": "gpt-4o",
      "model_provider": "OpenAI",
      "name": "Test ",
      "eval_type": "reliability",
      "eval_data": {
        "eval_status": "PASSED",
        "failed_tool_calls": [],
        "passed_tool_calls": [
          "multiply"
        ]
      },
      "eval_input": {
        "expected_tool_calls": [
          "multiply"
        ]
      },
      "created_at": "2025-08-27T15:41:59Z",
      "updated_at": "2025-08-27T15:41:59Z"
    }
  ]
}

GET

eval-runs

{
  "data": [
    {
      "id": "a03fa2f4-900d-482d-afe0-470d4cd8d1f4",
      "agent_id": "basic-agent",
      "model_id": "gpt-4o",
      "model_provider": "OpenAI",
      "name": "Test ",
      "eval_type": "reliability",
      "eval_data": {
        "eval_status": "PASSED",
        "failed_tool_calls": [],
        "passed_tool_calls": [
          "multiply"
        ]
      },
      "eval_input": {
        "expected_tool_calls": [
          "multiply"
        ]
      },
      "created_at": "2025-08-27T15:41:59Z",
      "updated_at": "2025-08-27T15:41:59Z"
    }
  ]
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

agent_id

string | null

Agent ID

team_id

string | null

Team ID

workflow_id

string | null

Workflow ID

model_id

string | null

Model ID

type

enum<string> | null

Filter type

Available options:

agent,

team,

workflow

limit

integer | null

default:20

Number of eval runs to return

page

integer | null

default:1

Page number

sort_by

string | null

default:created_at

Field to sort by

sort_order

enum<string> | null

default:desc

Sort order (asc or desc)

Available options:

asc,

desc

db_id

string | null

The ID of the database to use

table

string | null

The database table to use

eval_types

string | null

Comma-separated eval types (accuracy,performance,reliability)

Example:

"accuracy,performance"

Response

Evaluation runs retrieved successfully

data

EvalSchema · object[]

required

List of items for the current page

Show child attributes

data.id

string

required

Unique identifier for the evaluation run

data.eval_type

enum<string>

required

Type of evaluation (accuracy, performance, or reliability)

Available options:

accuracy,

performance,

reliability

data.eval_data

Eval Data · object

required

Evaluation results and metrics

data.agent_id

string | null

Agent ID that was evaluated

data.model_id

string | null

Model ID used in evaluation

data.model_provider

string | null

Model provider name

data.team_id

string | null

Team ID that was evaluated

data.workflow_id

string | null

Workflow ID that was evaluated

data.name

string | null

Name of the evaluation run

data.evaluated_component_name

string | null

Name of the evaluated component

data.eval_input

Eval Input · object

Input parameters used for the evaluation

data.created_at

string<date-time> | null

Timestamp when evaluation was created

data.updated_at

string<date-time> | null

Timestamp when evaluation was last updated

Agno SDK Reference

AgentOS API Reference

List Evaluation Runs

Authorizations

Query Parameters

Response