Multimodal
Generate Images with Intermediate Steps
Examples
- Introduction
- Getting Started
- Agents
- Workflows
- Applications
Agent Concepts
- Multimodal
- RAG
- Knowledge
- Memory
- Teams
- Async
- Hybrid Search
- Storage
- Tools
- Vector Databases
- Embedders
Models
- Anthropic
- AWS Bedrock Claude
- Azure OpenAI
- Cohere
- DeepSeek
- Fireworks
- Gemini
- Groq
- Hugging Face
- Mistral
- NVIDIA
- Ollama
- OpenAI
- Together
- Vertex AI
- xAI
Multimodal
Generate Images with Intermediate Steps
Code
from typing import Iterator
from agno.agent import Agent, RunResponse
from agno.models.openai import OpenAIChat
from agno.tools.dalle import DalleTools
from agno.utils.common import dataclass_to_dict
from rich.pretty import pprint
image_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DalleTools()],
description="You are an AI agent that can create images using DALL-E.",
instructions=[
"When the user asks you to create an image, use the DALL-E tool to create an image.",
"The DALL-E tool will return an image URL.",
"Return the image URL in your response in the following format: `![image description](image URL)`",
],
markdown=True,
)
run_stream: Iterator[RunResponse] = image_agent.run(
"Create an image of a yellow siamese cat",
stream=True,
stream_intermediate_steps=True,
)
for chunk in run_stream:
pprint(dataclass_to_dict(chunk, exclude={"messages"}))
print("---" * 20)
Usage
1
Create a virtual environment
Open the Terminal
and create a python virtual environment.
2
Set your API key
export OPENAI_API_KEY=xxx
3
Install libraries
pip install -U openai rich agno
4
Run Agent