Response Caching

Response caching allows you to cache model responses, which can significantly improve response times and reduce API costs during development and testing.

For a detailed overview of response caching, see Response Caching.

This is different from Anthropic’s prompt caching feature. Response caching caches the entire model response, while prompt caching caches the system prompt to reduce processing time.

Basic Usage

Enable caching by setting cache_response=True when initializing the model. The first call will hit the API and cache the response, while subsequent identical calls will return the cached result.

cache_model_response.py

import time

from agno.agent import Agent
from agno.models.openai import OpenAIChat

agent = Agent(model=OpenAIChat(id="gpt-4o", cache_response=True))

# Run the same query twice to demonstrate caching
for i in range(1, 3):
    print(f"\n{'=' * 60}")
    print(
        f"Run {i}: {'Cache Miss (First Request)' if i == 1 else 'Cache Hit (Cached Response)'}"
    )
    print(f"{'=' * 60}\n")

    response = agent.run(
        "Write me a short story about a cat that can talk and solve problems."
    )
    print(response.content)
    print(f"\n Elapsed time: {response.metrics.duration:.3f}s")

    # Small delay between iterations for clarity
    if i == 1:
        time.sleep(0.5)

Usage

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv .venv
source .venv/bin/activate

Set your API key

export ANTHROPIC_API_KEY=xxx

Install libraries

pip install -U anthropic agno

Run Agent

  python cache_model_response.py

Overview

Use Cases

Concepts

Models

Response Caching

Basic Usage

Usage

Overview

Use Cases

Concepts

Models

​Basic Usage

​Usage

Basic Usage

Usage