Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agno-v2-shaloo-ai-support-link.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Agents can process images, audio, video, and files as input, and generate images and audio as output. This section introduces Multimodal I/O. Check out the full guide for more details.

Media Classes

ClassParameters
Imageurl, filepath, content (bytes)
Audiourl, filepath, content (bytes), format
Videourl, filepath, content (bytes)
Fileurl, filepath, content (bytes)

Quickstart

Select Media Type:
Pass images via URL, file path, or base64 content:

from agno.agent import Agent
from agno.media import Image
from agno.models.openai import OpenAIResponses

agent = Agent(model=OpenAIResponses(id="gpt-5.2"))


# From URL
agent.run(

    "What's in this image?",

    images=[Image(url="https://example.com/photo.jpg")]

)

# From file
agent.run(
    "Describe this image",
    images=[Image(filepath="./photo.jpg")]
)

# Multiple images
agent.run(
    "Compare these two images",
    images=[
        Image(url="https://example.com/photo1.jpg"),
        Image(url="https://example.com/photo2.jpg")
    ]
)

Learn More

For more multimodal input-output examples, see the Multimodal documentation: