Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agno-v2-shaloo-ai-support-link.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This example demonstrates how to create an agent that can transcribe audio conversations, identifying different speakers and providing accurate transcriptions.

Code

audio_to_text.py
import requests
from agno.agent import Agent
from agno.media import Audio
from agno.models.google import Gemini

agent = Agent(
    model=Gemini(id="gemini-2.0-flash-exp"),
    markdown=True,
)

url = "https://agno-public.s3.us-east-1.amazonaws.com/demo_data/QA-01.mp3"

response = requests.get(url)
audio_content = response.content

# Give a transcript of this audio conversation. Use speaker A, speaker B to identify speakers.

agent.print_response(
    "Give a transcript of this audio conversation. Use speaker A, speaker B to identify speakers.",
    audio=[Audio(content=audio_content)],
    stream=True,
)

Usage

1

Set up your virtual environment

uv venv --python 3.12
source .venv/bin/activate
2

Install dependencies

uv pip install -U agno google-generativeai requests
3

Export your GOOGLE API key

  export GOOGLE_API_KEY="your_google_api_key_here"
4

Run Agent

python audio_to_text.py