Skip to main content
Agents can process images, audio, video, and files as input, and generate images and audio as output. This section introduces Multimodal I/O. Check out the full guide for more details.

Media Classes

ClassParameters
Imageurl, filepath, content (bytes)
Audiourl, filepath, content (bytes), format
Videourl, filepath, content (bytes)
Fileurl, filepath, content (bytes)

Quickstart

Select Media Type:
Pass images via URL, file path, or base64 content:

from agno.agent import Agent
from agno.media import Image
from agno.models.openai import OpenAIResponses

agent = Agent(model=OpenAIResponses(id="gpt-5.2"))


# From URL
agent.run(

    "What's in this image?",

    images=[Image(url="https://example.com/photo.jpg")]

)

# From file
agent.run(
    "Describe this image",
    images=[Image(filepath="./photo.jpg")]
)

# Multiple images
agent.run(
    "Compare these two images",
    images=[
        Image(url="https://example.com/photo1.jpg"),
        Image(url="https://example.com/photo2.jpg")
    ]
)

Learn More

For more multimodal input-output examples, see the Multimodal documentation: