> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agno.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Context Compression

> Compress tool call results to save context space while preserving critical information.

<Badge icon="code-branch" color="orange">
  <Tooltip tip="Introduced in v2.2.3" cta="View release notes" href="https://github.com/agno-agi/agno/releases/tag/v2.2.3">v2.2.3</Tooltip>
</Badge>

Context Compression allows you to manage your agent context while it is running, helping the agent stay within its context window and avoid rate limits or decreases in response quality.

Think of it like a research assistant who reads lengthy reports and gives you the key bullet points instead of the full documents.

## The Problem: Verbose Tool Results

If you are using tools with large response sizes, without compression, tool results quickly consume your context window:

| Component     | Cumulative Token Count | Notes             |
| ------------- | ---------------------- | ----------------- |
| System Prompt | 1,200 tokens           |                   |
| User Message  | 1,300 tokens           |                   |
| LLM Response  | 1,500 tokens           |                   |
| Tool Call 1   | 2,500 tokens           |                   |
| Tool Call 2   | 5,700 tokens           | 2,500 + 3,200 new |
| Tool Call 3   | 8,500 tokens           | 5,700 + 2,800 new |
| Tool Call 4   | 12,000 tokens          | 8,500 + 3,500 new |

This quickly becomes expensive and hits context limits during complex workflows.

## The Solution: Automatic Compression

Context compression summarizes tool results after a threshold:

```
Tool Call 1: 2,500 tokens
Tool Call 2: 5,700 tokens
Tool Call 3: 8,500 tokens
[Compression triggered]
Tool Call 4: 1,300 tokens (800 compressed + 500 new)
```

**Benefits:**

* Dramatically reduced token costs
* Stay within context window limits
* Preserve critical facts and data
* Automatic compression

## How It Works

Context compression follows a simple pattern:

<Steps>
  <Step title="Enable Compression">
    Set `compress_tool_results=True` on your agent or team, or provide a `CompressionManager`. The system monitors tool call results as they come in.
  </Step>

  <Step title="Threshold Reached">
    After the threshold is reached, compression is triggered. Each uncompressed tool call result is individually summarized.
  </Step>

  <Step title="Intelligent Summarization">
    The compression model preserves key facts (numbers, dates, entities, URLs) while removing boilerplate, redundancy, and filler text.
  </Step>

  <Step title="The LLM loop continues">
    The compressed tool results are used in the next LLM executions, reducing token usage and extending the life of your context window.
  </Step>
</Steps>

<Note>
  When using `arun` on `Agent` or `Team`, compression is handled asynchronously and the uncompressed tool call results are summarised concurrently.
</Note>

## Enable Compression

Turn on `compress_tool_results=True` to automatically compress tool results. This comes with a default threshold of 3 tool calls.

For example:

<CodeGroup>
  ```python Agent theme={null}
  from agno.agent import Agent
  from agno.models.openai import OpenAIResponses
  from agno.tools.hackernews import HackerNewsTools

  agent = Agent(
      model=OpenAIResponses(id="gpt-5.2"),
      tools=[HackerNewsTools()],
      compress_tool_results=True,
  )

  agent.print_response("Get the top stories on HackerNews about AI, ML, startups, and tech trends")
  ```

  ```python Team theme={null}
  from agno.agent import Agent
  from agno.models.openai import OpenAIResponses
  from agno.team import Team
  from agno.tools.hackernews import HackerNewsTools

  web_agent = Agent(
      name="HackerNews Researcher",
      tools=[HackerNewsTools()],
  )

  team = Team(
      model=OpenAIResponses(id="gpt-5.2"),
      members=[web_agent],
      compress_tool_results=True,
  )

  team.print_response("Get the top stories on HackerNews about AI, ML, startups, and tech trends")
  ```
</CodeGroup>

<Info>
  You can also enable `compress_tool_results=True` on individual team members to compress their tool results independently.
</Info>

## Custom Compression

Provide a [`CompressionManager`](/reference/compression/compression-manager) to customize the compression behavior:

<CodeGroup>
  ```python Agent theme={null}
  from agno.agent import Agent
  from agno.compression.manager import CompressionManager
  from agno.models.openai import OpenAIResponses
  from agno.tools.hackernews import HackerNewsTools

  compression_manager = CompressionManager(
      model=OpenAIResponses(id="gpt-5.2"),  # Use a faster model for compression
      compress_tool_results_limit=2,  # Compress after 2 tool calls (default: 3)
      compress_tool_call_instructions="Your custom compression prompt here...",
  )

  agent = Agent(
      model=OpenAIResponses(id="gpt-5.2"),
      tools=[HackerNewsTools()],
      compression_manager=compression_manager,
  )

  agent.print_response("Find stories about AI startup funding on HackerNews")
  ```

  ```python Team theme={null}
  from agno.agent import Agent
  from agno.compression.manager import CompressionManager
  from agno.models.openai import OpenAIResponses
  from agno.team import Team
  from agno.tools.hackernews import HackerNewsTools

  compression_manager = CompressionManager(
      model=OpenAIResponses(id="gpt-5.2"),  # Use a faster model for compression
      compress_tool_results_limit=2,  # Compress after 2 tool calls (default: 3)
      compress_tool_call_instructions="Your custom compression prompt here...",
  )

  web_agent = Agent(
      name="HackerNews Researcher",
      tools=[HackerNewsTools()],
  )

  team = Team(
      model=OpenAIResponses(id="gpt-5.2"),
      members=[web_agent],
      compression_manager=compression_manager,
  )

  team.print_response("Find stories about AI startup funding on HackerNews")
  ```
</CodeGroup>

<Tip>
  Use a faster, cheaper model like `gpt-4o-mini` for compression to reduce latency and cost while using a more capable model as your Agent's main model.
</Tip>

## Compression Triggers

The `CompressionManager` supports two types of thresholds for triggering compression:

| Mode            | Parameter                     | Use Case                                                                                         |
| --------------- | ----------------------------- | ------------------------------------------------------------------------------------------------ |
| **Count-Based** | `compress_tool_results_limit` | Predictable tool call patterns. Triggers after N uncompressed tool results.                      |
| **Token-Based** | `compress_token_limit`        | Variable result sizes or strict context limits. Triggers when context exceeds a token threshold. |

<Note>
  If neither threshold is set, `compress_tool_results_limit` defaults to `3`.
</Note>

### Tool-Based Compression

Set `compress_tool_results_limit` when you have predictable tool call patterns and want compression to trigger after a fixed number of tool call results.

### Token-Based Compression

Use `compress_token_limit` when you need precise control over context size, especially when tool results vary significantly in size:

<CodeGroup>
  ```python Agent theme={null}
  from agno.agent import Agent
  from agno.compression.manager import CompressionManager
  from agno.models.openai import OpenAIResponses
  from agno.tools.hackernews import HackerNewsTools

  compression_manager = CompressionManager(
      model=OpenAIResponses(id="gpt-5.2"),
      compress_tool_results=True,
      compress_token_limit=5000,  # or compress_tool_results_limit
  )

  agent = Agent(
      model=OpenAIResponses(id="gpt-5.2"),
      tools=[HackerNewsTools()],
      compression_manager=compression_manager,
  )

  agent.print_response("Find HackerNews discussions about OpenAI, Anthropic, Google DeepMind, and Meta AI")
  ```

  ```python Team theme={null}
  from agno.agent import Agent
  from agno.compression.manager import CompressionManager
  from agno.models.openai import OpenAIResponses
  from agno.team import Team
  from agno.tools.hackernews import HackerNewsTools

  compression_manager = CompressionManager(
      model=OpenAIResponses(id="gpt-5.2"),
      compress_tool_results=True,
      compress_token_limit=5000,  # or compress_tool_results_limit
  )

  web_agent = Agent(
      name="HackerNews Researcher",
      tools=[HackerNewsTools()],
  )

  team = Team(
      model=OpenAIResponses(id="gpt-5.2"),
      members=[web_agent],
      compression_manager=compression_manager,
  )

  team.print_response("Find HackerNews discussions about OpenAI, Anthropic, Google DeepMind, and Meta AI")
  ```
</CodeGroup>

<Info>
  Token counting includes messages, tool definitions, and output schemas. See [Token Counting](/compression/token-counting) for details.
</Info>

## When to Use Context Compression

**Perfect for:**

* Agents with tools that return verbose results (web search, APIs)
* Multi-step workflows with many tool calls
* Long-running sessions where context accumulates
* Production systems where cost matters

## Developer Resources

* [CompressionManager Reference](/reference/compression/compression-manager) - Full CompressionManager documentation
* [Agent Reference](/reference/agents/agent) - Agent parameter documentation
* [Team Reference](/reference/teams/team) - Team parameter documentation
