Skip to main content
Memory is powerful, but without careful configuration, it can lead to unexpected token consumption, behavioral issues, and high costs. This guide shows you what to watch out for and how to optimize your memory usage for production.

Quick Reference

  • Default to automatic memory (enable_user_memories=True) unless you have a specific reason for agentic control
  • Always provide user_id, don’t rely on the default “default” user
  • Use cheaper models for memory operations when using agentic memory
  • Implement pruning for long-running applications
  • Monitor token usage in production to catch memory-related cost spikes
  • Test with realistic data: 100+ memories behave very differently than 5 memories

The Agentic Memory Token Trap

The Problem: When you use enable_agentic_memory=True, every memory operation triggers a separate, nested LLM call. This architecture can cause token usage to explode, especially as memories accumulate. Here’s what happens under the hood:
  1. User sends a message → Main LLM call processes it
  2. Agent decides to update memory → Calls update_user_memory tool
  3. Nested LLM call fires with:
    • Detailed system prompt (~50 lines)
    • ALL existing user memories loaded into context
    • Memory management instructions and tools
  4. Memory LLM makes tool calls (add, update, delete)
  5. Control returns to main conversation
Real-world impact:
# Scenario: User with 100 existing memories
agent = Agent(
    db=db,
    enable_agentic_memory=True,
    model=OpenAIChat(id="gpt-4o")
)

# 10-message conversation where agent updates memory 7 times:
# Normal conversation: 10 × 500 tokens = 5,000 tokens
# With agentic memory: (10 × 500) + (7 × 5,000) = 40,000 tokens
# Cost increase: 8x more expensive!
As memories accumulate, each memory operation gets more expensive. With 200 memories, a single memory update could consume 10,000+ tokens just loading context.

Mitigation Strategy #1: Use Automatic Memory

For most use cases, automatic memory is your best bet—it’s significantly more efficient:
# Recommended: Single memory processing after conversation
agent = Agent(
    db=db,
    enable_user_memories=True  # Processes memories once at end
)

# Only use agentic memory when you specifically need:
# - Real-time memory updates during conversation
# - User-directed memory commands ("forget my address")
# - Complex memory reasoning within the conversation flow

Mitigation Strategy #2: Use a Cheaper Model for Memory Operations

If you do need agentic memory, use a less expensive model for memory management while keeping a powerful model for conversation:
from agno.memory import MemoryManager
from agno.models.openai import OpenAIChat

# Cheap model for memory operations (60x less expensive)
memory_manager = MemoryManager(
    db=db,
    model=OpenAIChat(id="gpt-4o-mini")
)

# Expensive model for main conversations
agent = Agent(
    db=db,
    model=OpenAIChat(id="gpt-4o"),
    memory_manager=memory_manager,
    enable_agentic_memory=True
)
This approach can reduce memory-related costs by 98% while maintaining conversation quality.

Mitigation Strategy #3: Guide Memory Behavior with Instructions

Add explicit instructions to prevent frivolous memory updates:
agent = Agent(
    db=db,
    enable_agentic_memory=True,
    instructions=[
        "Only update memories when users share significant new information.",
        "Don't create memories for casual conversation or temporary states.",
        "Batch multiple memory updates together when possible."
    ]
)

Mitigation Strategy #4: Implement Memory Pruning

Prevent memory bloat by periodically cleaning up old or irrelevant memories:
from datetime import datetime, timedelta

def prune_old_memories(db, user_id, days=90):
    """Remove memories older than 90 days"""
    cutoff_timestamp = int((datetime.now() - timedelta(days=days)).timestamp())
    
    memories = db.get_user_memories(user_id=user_id)
    for memory in memories:
        if memory.updated_at and memory.updated_at < cutoff_timestamp:
            db.delete_user_memory(memory_id=memory.memory_id)

# Run periodically or before high-cost operations
prune_old_memories(db, user_id="john_doe@example.com")

Mitigation Strategy #5: Set Tool Call Limits

Prevent runaway memory operations by limiting tool calls per conversation:
agent = Agent(
    db=db,
    enable_agentic_memory=True,
    tool_call_limit=5  # Prevents excessive memory operations
)

Common Pitfalls

The user_id Pitfall

The Problem: Forgetting to set user_id causes all memories to default to user_id="default", mixing different users’ memories together.
# ❌ Bad: All users share the same memories
agent.print_response("I love pizza")
agent.print_response("I'm allergic to dairy")

# ✅ Good: Each user has isolated memories
agent.print_response("I love pizza", user_id="user_123")
agent.print_response("I'm allergic to dairy", user_id="user_456")
Best practice: Always pass user_id explicitly, especially in multi-user applications.

The Double-Enable Pitfall

The Problem: Using both enable_user_memories=True and enable_agentic_memory=True doesn’t give you both—agentic mode overrides automatic mode.
# ❌ Doesn't work as expected - automatic memory is disabled
agent = Agent(
    db=db,
    enable_user_memories=True,
    enable_agentic_memory=True  # This disables automatic behavior
)

# ✅ Choose one approach
agent = Agent(db=db, enable_user_memories=True)  # Automatic
# OR
agent = Agent(db=db, enable_agentic_memory=True)  # Agentic

Memory Growth Monitoring

Track memory counts to catch issues early:
from agno.agent import Agent

agent = Agent(db=db, enable_user_memories=True)

# Check memory count for a user
memories = agent.get_user_memories(user_id="user_123")
print(f"User has {len(memories)} memories")

# Alert if memory count is unusually high
if len(memories) > 500:
    print("⚠️ Warning: User has excessive memories. Consider pruning.")
I