Skip to main content
Interact with Ollama models using the OpenAI Responses API. This uses Ollama’s OpenAI-compatible /v1/responses endpoint, added in Ollama v0.13.3.

Requirements

  • Ollama v0.13.3 or later
  • For local usage: Ollama server running at http://localhost:11434
  • For Ollama Cloud: Set OLLAMA_API_KEY environment variable

Key Features

  • Dual Deployment: Run locally for privacy or use Ollama Cloud for scalability
  • Auto-configuration: When using an API key, the host automatically defaults to Ollama Cloud
  • Stateless API: Each request is independent (no previous_response_id chaining)

Parameters

ParameterTypeDefaultDescription
idstr"gpt-oss:20b"The ID of the Ollama model to use
namestr"OllamaResponses"The name of the model
providerstr"Ollama"The provider of the model
hostOptional[str]NoneThe Ollama server host (defaults to http://localhost:11434)
api_keyOptional[str]NoneThe API key for Ollama Cloud (not required for local)
storeOptional[bool]FalseWhether to store responses

Usage

Local Usage

from agno.agent import Agent
from agno.models.ollama import OllamaResponses

agent = Agent(
    model=OllamaResponses(id="gpt-oss:20b"),
    markdown=True,
)

agent.print_response("Share a 2 sentence horror story")

Ollama Cloud

Set the OLLAMA_API_KEY environment variable:
export OLLAMA_API_KEY=your-api-key
from agno.agent import Agent
from agno.models.ollama import OllamaResponses

agent = Agent(
    model=OllamaResponses(id="gpt-oss:20b"),
    markdown=True,
)

agent.print_response("Share a 2 sentence horror story")

Custom Host

from agno.agent import Agent
from agno.models.ollama import OllamaResponses

agent = Agent(
    model=OllamaResponses(
        id="gpt-oss:20b",
        host="http://my-ollama-server:11434",
    ),
    markdown=True,
)

agent.print_response("Hello!")

Developer Resources