Skip to main content
An intelligent invoice processing agent that extracts structured data from invoice documents (PDF, images) using vision capabilities.

What You’ll Learn

ConceptDescription
Vision ExtractionUsing LLM vision to understand document layouts
Structured OutputExtracting complex nested data (vendor, line items, totals)
Data ValidationVerifying extracted data for accuracy
Document ProcessingHandling PDFs and images

Prerequisites

  • Python 3.12+
  • OpenAI API key
  • poppler (system dependency for PDF to image conversion)

Setup

1

Clone the repository

git clone https://github.com/agno-agi/agno.git
cd agno
2

Create and activate virtual environment

uv venv --python 3.12
source .venv/bin/activate
3

Install system dependencies

# macOS
brew install poppler

# Ubuntu/Debian
apt-get install poppler-utils
4

Install Python dependencies

uv pip install -r cookbook/01_showcase/01_agents/invoice_analyst/requirements.in
5

Set environment variables

export OPENAI_API_KEY=your-openai-key

Run the Agent

Single Invoice Extraction

Extract data from a single invoice:
python cookbook/01_showcase/01_agents/invoice_analyst/examples/extract_invoice.py path/to/invoice.pdf
Demonstrates:
  • Loading an invoice document
  • Extracting structured data
  • Accessing vendor, line items, and totals

Data Validation

Validate extracted invoice data:
python cookbook/01_showcase/01_agents/invoice_analyst/examples/validate_data.py
Demonstrates:
  • Validating line item math
  • Checking subtotal and total calculations
  • Identifying data quality issues

Batch Processing

Process multiple invoices:
python cookbook/01_showcase/01_agents/invoice_analyst/examples/batch_process.py

Agent Configuration

invoice_agent = Agent(
    name="Invoice Analyst",
    model=OpenAIResponses(id="gpt-5.2"),
    system_message=SYSTEM_MESSAGE,
    output_schema=InvoiceData,
    tools=[
        ReasoningTools(add_instructions=True),
    ],
    add_datetime_to_context=True,
    add_history_to_context=True,
    num_history_runs=5,
    enable_agentic_memory=True,
    markdown=True,
)
ParameterPurpose
modelGPT-5.2 with vision capabilities
output_schemaPydantic model for structured invoice data
ReasoningToolsPlan extraction approach and validate data
The agent uses GPT-5.2’s native vision capabilities. No additional vision tools are needed.

How It Works

Extraction Workflow

1. Load invoice document
2. Convert to image(s) if PDF
3. Send to Claude with vision capabilities
4. Extract fields using visual understanding
5. Parse line items table
6. Validate totals and calculations
7. Return structured data with confidence score

Output Schema

class InvoiceData(BaseModel):
    invoice_number: str
    invoice_date: date
    due_date: date | None
    vendor: Vendor
    line_items: list[LineItem]
    subtotal: Decimal
    tax_amount: Decimal | None
    total_amount: Decimal
    currency: str
    confidence_score: float
    warnings: list[str]

Validation Rules

CheckFormula
Line item mathquantity × unit_price = amount
SubtotalSum of line items ≈ subtotal
Totalsubtotal + tax - discount + shipping ≈ total

Troubleshooting

Ensure poppler is installed:
# macOS
brew install poppler

# Ubuntu
apt-get install poppler-utils
Scanned documents may have poor image quality, skewed alignment, or background noise. Try improving the scan quality.
Common causes: rounding differences, hidden fees, or multi-page invoices with partial totals. Review warnings and verify manually if needed.

Source Code