The Problem: Growing Token Costs
Without summaries, every message adds to your context window:The Solution: Automatic Summaries
Session summaries condense your history:- ✅ Dramatically reduced token costs
- ✅ Avoid context window limits
- ✅ Maintain conversation continuity
- ✅ Automatic creation and updates
How It Works
Session summaries follow a simple three-step pattern:1
Enable Summary Generation
Set
enable_session_summaries=True on your agent or team. Summaries are automatically created and updated after runs when there are meaningful messages to summarize, then stored in your database.2
Use Summaries in Context
Set
add_session_summary_to_context=True to include the summary in your messages (this is enabled by default if you enable session summary generation). Instead of sending dozens of historical messages, only the condensed summary is sent, dramatically reducing tokens while maintaining context.3
Customize (Optional)
Use
SessionSummaryManager to control summary generation - use a cheaper model, customize prompts, or change the summary format. This lets you optimize costs by using a lightweight model for summaries while keeping your main agent powerful.Enable Session Summaries
Turn onenable_session_summaries=True to have Agno maintain a rolling summary for each session. Summaries sit alongside the stored history and can be reused later to save tokens.
Customizing Generation
- Provide a
SessionSummaryManagerto specify a cheaper model or custom prompt - Run summary generation out-of-band by instantiating a lightweight Agent that just calls
get_session_summaryacross all sessions
Use Summary in Context
add_session_summary_to_context=True is enabled by default if you enable session summary generation. If you don’t want summaries to be generated, but still want to use them in context, you can set add_session_summary_to_context=True. Alternatively, if you don’t want to use summaries in context, you can set add_session_summary_to_context=False.
When to Use Session Summaries
✅ Perfect for:- Long-running customer support conversations
- Multi-day or multi-week interactions
- Conversations with 10+ turns
- Production systems where cost matters
- Short conversations (fewer than 5 turns)
- When full detail is critical
- Real-time chat with recent context only