Audio Streaming

This example demonstrates how to use Agno agents to generate streaming audio responses using OpenAI’s GPT-4o audio preview model.

import base64
import wave
from typing import Iterator

from agno.agent import Agent, RunOutputEvent
from agno.models.openai import OpenAIResponses

# Audio Configuration
SAMPLE_RATE = 24000  # Hz (24kHz)
CHANNELS = 1  # Mono (Change to 2 if Stereo)
SAMPLE_WIDTH = 2  # Bytes (16 bits)

# Provide the agent with the audio file and audio configuration and get result as text + audio
agent = Agent(
    model=OpenAIResponses(
        id="gpt-5.2-audio-preview",
        modalities=["text", "audio"],
        audio={
            "voice": "alloy",
            "format": "pcm16",
        },  # Only pcm16 is supported with streaming
    ),
)
output_stream: Iterator[RunOutputEvent] = agent.run(
    "Tell me a 10 second story", stream=True
)

filename = "tmp/response_stream.wav"

# Open the file once in append-binary mode
with wave.open(str(filename), "wb") as wav_file:
    wav_file.setnchannels(CHANNELS)
    wav_file.setsampwidth(SAMPLE_WIDTH)
    wav_file.setframerate(SAMPLE_RATE)

    # Iterate over generated audio
    for response in output_stream:
        response_audio = response.response_audio  # type: ignore
        if response_audio:
            if response_audio.transcript:
                print(response_audio.transcript, end="", flush=True)
            if response_audio.content:
                try:
                    pcm_bytes = base64.b64decode(response_audio.content)
                    wav_file.writeframes(pcm_bytes)
                except Exception as e:
                    print(f"Error decoding audio: {e}")
print()

Key Features

Real-time Audio Streaming: Streams audio responses in real-time using OpenAI’s audio preview model
PCM16 Audio Format: Uses high-quality PCM16 format for audio streaming
Transcript Generation: Provides simultaneous text transcription of generated audio
WAV File Creation: Saves streamed audio directly to a WAV file format
Error Handling: Includes robust error handling for audio decoding

Use Cases

Interactive voice assistants
Real-time storytelling applications
Audio content generation
Voice-enabled chatbots
Dynamic audio responses for applications

Technical Details

The example configures audio streaming with 24kHz sample rate, mono channel, and 16-bit sample width. The streaming approach allows for real-time audio playback while maintaining high audio quality through the PCM16 format.

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Audio Streaming

Key Features

Use Cases

Technical Details

Get Started

Basics

Capabilities

Context

Production

Providers

Other

Additional Resources

Documentation Index

​Key Features

​Use Cases

​Technical Details

Key Features

Use Cases

Technical Details