When you are developing or testing new features, it is typical to hit the model with the same query multiple times. In these cases you normally don’t need the model to generate the same answer, and can cache the response to save on tokens. Response caching allows you to cache model responses locally, to avoid repeated API calls and reduce costs when the same query is made multiple times.Documentation Index
Fetch the complete documentation index at: https://agno-v2-shaloo-ai-support-link.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Response Caching vs. Prompt Caching: Response caching (covered here) caches the entire model response locally to avoid API calls. Prompt caching caches the system prompt on the model provider’s side to reduce processing time and costs.
Why Use Response Caching?
Response caching provides several benefits:- Faster Development: Avoid waiting for API responses during iterative development
- Cost Reduction: Eliminate redundant API calls for identical queries
- Consistent Testing: Ensure test cases receive the same responses across runs
- Offline Development: Work with cached responses when API access is limited
- Rate Limit Management: Reduce the number of API calls to stay within rate limits
How It Works
When response caching is enabled:- Cache Key Generation: A unique key is generated based on the request parameters (messages, response format, tools, etc.)
- Cache Lookup: Before making an API call, Agno checks if a cached response exists for that key
- Cache Hit: If found, the cached response is returned immediately
- Cache Miss: If not found, the API is called and the response is cached for future use
- TTL Expiration: Cached responses respect the configured time-to-live (TTL) and expire automatically
Basic Usage
Enable response caching by settingcache_response=True when initializing your model:
Configuration Options
Cache Time-to-Live (TTL)
Control how long responses remain cached usingcache_ttl (in seconds):
cache_ttl is not specified (or set to None), cached responses never expire.
Custom Cache Directory
Store cached responses in a specific location usingcache_dir:
~/.agno/cache/model_responses in your home directory.
Usage with Agents
Response caching is configured at the model level and works automatically with agents:Usage with Teams
Response caching works withTeam as well. You can enable it on individual team members and the team leader model: