The Interactions API is an alternative to Gemini’sDocumentation Index
Fetch the complete documentation index at: https://docs.agno.com/llms.txt
Use this file to discover all available pages before exploring further.
generateContent endpoint. Instead of sending the full conversation history on every turn, it stores prior turns server-side and references them via previous_interaction_id. This reduces token costs and latency through implicit caching.
See the Interactions API documentation for more details.
Installation
Authentication
Set theGOOGLE_API_KEY environment variable. You can get one from Google AI Studio.
Example
View more examples here.
How It Works
- On the first turn, the agent sends the user message and receives a response along with an
interaction_id. - On subsequent turns, only the new message is sent with
previous_interaction_idreferencing the prior turn. - The server reconstructs the full context from stored history, applying implicit caching to reduce cost.
Agent class handles interaction_id tracking automatically.
Capabilities
Multi-turn
Server-side history management
Thinking
Reasoning with thinking levels
Google Search
Built-in web search
Tool Use
Function calling
Structured Output
Pydantic schema enforcement
Background Execution
Long-running tasks
Multi-turn Conversations
The key advantage of the Interactions API. Prior turns are stored server-side and referenced by ID, so only the new message is sent each turn.Thinking
Enable extended reasoning with thethinking_level parameter. Accepts "low" or "high".
Google Search
Enable built-in Google Search by settingsearch=True. No external tool needed.
Tool Use
Function calling works the same as with theGemini class.
Structured Output
Use Pydantic models to enforce a JSON schema on the response.Background Execution
For long-running tasks like Deep Research, enable background execution. The API offloads the task and returns results when complete.Interactions API vs generateContent
| Feature | GeminiInteractions | Gemini |
|---|---|---|
| Conversation history | Server-side, referenced by ID | Client-side, resent each turn |
| Caching | Implicit on prior turns | Manual via context caching API |
| Token cost on multi-turn | Lower (only new message sent) | Higher (full history resent) |
| Background execution | Supported | Not supported |
| Response format | Typed execution steps | Generic content parts |
Params
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "gemini-3-flash-preview" | The model identifier |
name | str | "GeminiInteractions" | The name of the model |
provider | str | "Google" | The provider of the model |
api_key | Optional[str] | None | Google API key (defaults to GOOGLE_API_KEY env var) |
temperature | Optional[float] | None | Controls randomness (0.0-2.0) |
top_p | Optional[float] | None | Nucleus sampling threshold |
max_output_tokens | Optional[int] | None | Maximum tokens in response |
stop_sequences | Optional[list[str]] | None | Sequences that stop generation |
seed | Optional[int] | None | Random seed for reproducibility |
response_modalities | Optional[list[str]] | None | Output types (e.g., ["text", "image"]) |
store | Optional[bool] | None | Persist interactions server-side (default: True) |
background | Optional[bool] | None | Offload to background execution |
thinking_level | Optional[str] | None | Reasoning intensity: "low" or "high" |
search | bool | False | Enable built-in Google Search |
url_context | bool | False | Enable URL context extraction |
code_execution | bool | False | Enable code execution |
service_tier | Optional[str] | None | Inference tier: "flex", "standard", or "priority" |
timeout | Optional[float] | None | Request timeout in seconds |
client_params | Optional[Dict[str, Any]] | None | Additional client parameters |
GeminiInteractions is a subclass of the Model class and has access to the same params.