/v1/responses endpoint, added in Ollama v0.13.3.
Requirements
- Ollama v0.13.3 or later
- For local usage: Ollama server running at
http://localhost:11434 - For Ollama Cloud: Set
OLLAMA_API_KEYenvironment variable
Key Features
- Dual Deployment: Run locally for privacy or use Ollama Cloud for scalability
- Auto-configuration: When using an API key, the host automatically defaults to Ollama Cloud
- Stateless API: Each request is independent (no
previous_response_idchaining)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "gpt-oss:20b" | The ID of the Ollama model to use |
name | str | "OllamaResponses" | The name of the model |
provider | str | "Ollama" | The provider of the model |
host | Optional[str] | None | The Ollama server host (defaults to http://localhost:11434) |
api_key | Optional[str] | None | The API key for Ollama Cloud (not required for local) |
store | Optional[bool] | False | Whether to store responses |
Usage
Local Usage
Ollama Cloud
Set theOLLAMA_API_KEY environment variable:
Custom Host
Developer Resources
- Ollama Responses API Documentation
- Ollama (Chat Completion API)