Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agno.com/llms.txt

Use this file to discover all available pages before exploring further.

A single model is one point of failure. Run two labelers from different providers, diff their outputs, and adjudicate only where they disagree. The disagreement record is the audit trail.
from typing import List, Optional

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIResponses
from pydantic import BaseModel, Field


class Contact(BaseModel):
    name: Optional[str] = None
    email: Optional[str] = None
    company: Optional[str] = None


class FieldDisagreement(BaseModel):
    field: str = Field(..., description="Top-level Contact field name")
    value_a: Optional[str] = None
    value_b: Optional[str] = None
    reason: str = Field(..., description="Why this field needs adjudication")


class DisagreementReport(BaseModel):
    disagreements: List[FieldDisagreement] = Field(default_factory=list)
    needs_adjudication: bool = Field(..., description="True if any field disagrees")


class FinalLabel(BaseModel):
    contact: Contact
    notes: Optional[str] = None


LABELER = "Extract contact info. Use exactly what the text shows. Null if missing."

labeler_a = Agent(model=OpenAIResponses(id="gpt-5.5"), instructions=LABELER, output_schema=Contact)
labeler_b = Agent(model=Claude(id="claude-sonnet-4-5"), instructions=LABELER, output_schema=Contact)

reviewer = Agent(
    model=OpenAIResponses(id="gpt-5.5"),
    instructions=(
        "Compare two labelers' Contact outputs field by field. A field "
        "needs adjudication when both are non-null but differ. Set "
        "needs_adjudication=true if any field does."
    ),
    output_schema=DisagreementReport,
)

adjudicator = Agent(
    model=OpenAIResponses(id="gpt-5.5"),
    instructions=(
        "Re-read the original text and resolve every reported "
        "disagreement. Return a FinalLabel with the correct values."
    ),
    output_schema=FinalLabel,
)


def label_with_quality_review(text: str) -> FinalLabel:
    a = labeler_a.run(text).content
    b = labeler_b.run(text).content

    report = reviewer.run(
        f"Labeler A:\n{a.model_dump_json()}\n\nLabeler B:\n{b.model_dump_json()}"
    ).content

    if not report.needs_adjudication:
        return FinalLabel(contact=a, notes="Labelers agreed.")

    return adjudicator.run(
        f"Original input:\n{text}\n\n"
        f"Labeler A:\n{a.model_dump_json()}\n\n"
        f"Labeler B:\n{b.model_dump_json()}\n\n"
        f"Reviewer report:\n{report.model_dump_json()}"
    ).content

The flow

  1. Two labelers, two providers. Provider diversity is the signal. When OpenAI and Anthropic disagree, that is where the document is ambiguous.
  2. Reviewer diffs them. It emits one FieldDisagreement per conflicting field and a single needs_adjudication flag. Agreement short-circuits the expensive step.
  3. Adjudicator runs only on disagreement. It re-reads the original input with both labels and the reviewer’s report, then returns the final record.
The DisagreementReport is worth persisting. Disagreement rate by field, by vendor, and over time tells you where the schema or the prompt is weak.

Production composition

The example above is three sequential calls so the pattern is readable. For a million-document job, wrap labelers in a Parallel step and gate the adjudicator behind a Condition in a Workflow. See parallel workflows and conditional workflows.

Production checklist

Agno gives you the orchestration primitives. These concerns are yours to add, and the cookbook is explicit about each.
ConcernWhat to add
Rate limitingWrap agent.arun with a per-provider limiter, or front it with a gateway. Agno does not throttle outbound calls.
Bounded concurrencyAn asyncio.Semaphore around the batch fan-out.
Dead-letter queueRecord failed item IDs and re-run them through a stricter pass.
IdempotencyA deterministic session_id per item, so re-runs upsert instead of duplicate.
Provider Batch APIsFor non-urgent jobs, call the provider batch endpoints directly for the discount. Not wrapped by Agno.
Prompt versioningTrack a prompt_version in run metadata so historical labels stay joinable.
Authoritative costRunMetrics.cost is populated only when the provider returns it. Attach a token-rate table downstream if you need exact numbers.

Next steps

TaskGuide
Build the labelersStructured extraction
Compose as a workflowWorkflows
Run agents concurrentlyAsync execution

Developer Resources