Response Caching

Basic Usage
Usage

For a conceptual overview of response caching, see Response Caching.

Response caching allows you to cache model responses, which can significantly improve response times and reduce API costs during development and testing.

Basic Usage

Enable caching by setting cache_response=True when initializing the model. The first call will hit the API and cache the response, while subsequent identical calls will return the cached result.

cache_model_response.py

import time

from agno.agent import Agent
from agno.models.openai import OpenAIResponses

agent = Agent(model=OpenAIResponses(id="gpt-5.2", cache_response=True))

# Run the same query twice to demonstrate caching
for i in range(1, 3):
    print(f"\n{'=' * 60}")
    print(
        f"Run {i}: {'Cache Miss (First Request)' if i == 1 else 'Cache Hit (Cached Response)'}"
    )
    print(f"{'=' * 60}\n")

    response = agent.run(
        "Write me a short story about a cat that can talk and solve problems."
    )
    print(response.content)
    print(f"\n Elapsed time: {response.metrics.duration:.3f}s")

    # Small delay between iterations for clarity
    if i == 1:
        time.sleep(0.5)

Usage

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate

Set your API key

export OPENAI_API_KEY=xxx

Install dependencies

uv pip install -U openai agno

Run Agent

python cache_model_response.py

Agent with Reasoning Effort Overview

⌘I

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Response Caching

Basic Usage

Usage

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Documentation Index

​Basic Usage

​Usage

Basic Usage

Usage