Documentation Index
Fetch the complete documentation index at: https://agno-v2-shaloo-ai-support-link.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Memory is powerful, but without careful configuration, it can lead to unexpected token consumption, behavioral issues, and high costs. This guide shows you what to watch out for and how to optimize your memory usage for production.
Quick Reference
- Default to automatic memory (
update_memory_on_run=True) unless you have a specific reason for agentic control
- Always provide user_id, don’t rely on the default “default” user
- Use cheaper models for memory operations when using agentic memory
- Implement pruning for long-running applications
- Monitor token usage in production to catch memory-related cost spikes
- Test with realistic data: 100+ memories behave very differently than 5 memories
The Agentic Memory Token Trap
The Problem: When you use enable_agentic_memory=True, every memory operation triggers a separate, nested LLM call. This architecture can cause token usage to explode, especially as memories accumulate.
Here’s what happens under the hood:
- User sends a message → Main LLM call processes it
- Agent decides to update memory → Calls
update_user_memory tool
- Nested LLM call fires with:
- Detailed system prompt (~50 lines)
- ALL existing user memories loaded into context
- Memory management instructions and tools
- Memory LLM makes tool calls (add, update, delete)
- Control returns to main conversation
Real-world impact:
# Scenario: User with 100 existing memories
agent = Agent(
db=db,
enable_agentic_memory=True,
model=OpenAIResponses(id="gpt-5.2")
)
# 10-message conversation where agent updates memory 7 times:
# Normal conversation: 10 × 500 tokens = 5,000 tokens
# With agentic memory: (10 × 500) + (7 × 5,000) = 40,000 tokens
# Cost increase: 8x more expensive!
As memories accumulate, each memory operation gets more expensive. With 200 memories, a single memory update could consume 10,000+ tokens just loading context.
Mitigation Strategy #1: Use Automatic Memory
For most use cases, automatic memory is your best bet—it’s significantly more efficient:
# Recommended: Single memory processing after conversation
agent = Agent(
db=db,
update_memory_on_run=True # Processes memories once at end
)
# Only use agentic memory when you specifically need:
# - Real-time memory updates during conversation
# - User-directed memory commands ("forget my address")
# - Complex memory reasoning within the conversation flow
Mitigation Strategy #2: Use a Cheaper Model for Memory Operations
If you do need agentic memory, use a less expensive model for memory management while keeping a powerful model for conversation:
from agno.memory import MemoryManager
from agno.models.openai import OpenAIResponses
# Cheap model for memory operations (60x less expensive)
memory_manager = MemoryManager(
db=db,
model=OpenAIResponses(id="gpt-5.2")
)
# Expensive model for main conversations
agent = Agent(
db=db,
model=OpenAIResponses(id="gpt-5.2"),
memory_manager=memory_manager,
enable_agentic_memory=True
)
This approach can reduce memory-related costs by 98% while maintaining conversation quality.
Mitigation Strategy #3: Guide Memory Behavior with Instructions
Add explicit instructions to prevent frivolous memory updates:
agent = Agent(
db=db,
enable_agentic_memory=True,
instructions=[
"Only update memories when users share significant new information.",
"Don't create memories for casual conversation or temporary states.",
"Batch multiple memory updates together when possible."
]
)
Mitigation Strategy #4: Implement Memory Pruning
Prevent memory bloat by periodically cleaning up old or irrelevant memories:
from datetime import datetime, timedelta
def prune_old_memories(db, user_id, days=90):
"""Remove memories older than 90 days"""
cutoff_timestamp = int((datetime.now() - timedelta(days=days)).timestamp())
memories = db.get_user_memories(user_id=user_id)
for memory in memories:
if memory.updated_at and memory.updated_at < cutoff_timestamp:
db.delete_user_memory(memory_id=memory.memory_id)
# Run periodically or before high-cost operations
prune_old_memories(db, user_id="john_doe@example.com")
Prevent runaway memory operations by limiting tool calls per conversation:
agent = Agent(
db=db,
enable_agentic_memory=True,
tool_call_limit=5 # Prevents excessive memory operations
)
Common Pitfalls
The user_id Pitfall
The Problem: Forgetting to set user_id causes all memories to default to user_id="default", mixing different users’ memories together.
# ❌ Bad: All users share the same memories
agent.print_response("I love pizza")
agent.print_response("I'm allergic to dairy")
# ✅ Good: Each user has isolated memories
agent.print_response("I love pizza", user_id="user_123")
agent.print_response("I'm allergic to dairy", user_id="user_456")
Best practice: Always pass user_id explicitly, especially in multi-user applications.
The Double-Enable Pitfall
The Problem: Using both update_memory_on_run=True and enable_agentic_memory=True doesn’t give you both—agentic mode overrides automatic mode.
# ❌ Doesn't work as expected - automatic memory is disabled
agent = Agent(
db=db,
update_memory_on_run=True,
enable_agentic_memory=True # This disables automatic behavior
)
# ✅ Choose one approach
agent = Agent(db=db, update_memory_on_run=True) # Automatic
# OR
agent = Agent(db=db, enable_agentic_memory=True) # Agentic
Memory Growth Monitoring
Track memory counts to catch issues early:
from agno.agent import Agent
agent = Agent(db=db, update_memory_on_run=True)
# Check memory count for a user
memories = agent.get_user_memories(user_id="user_123")
print(f"User has {len(memories)} memories")
# Alert if memory count is unusually high
if len(memories) > 500:
print("⚠️ Warning: User has excessive memories. Consider pruning.")
Developer Resources