OpenAI

OpenAITools allow an Agent to interact with OpenAI models for performing audio transcription, image generation, and text-to-speech.

Prerequisites

Before using OpenAITools, ensure you have the openai library installed and your OpenAI API key configured.

Install the library:
```
pip install -U openai
```
Set your API key: Obtain your API key from OpenAI and set it as an environment variable.
export OPENAI_API_KEY="your-openai-api-key"

Initialization

Import OpenAITools and add it to your Agent’s tool list.

from agno.agent import Agent
from agno.tools.openai import OpenAITools

agent = Agent(
    name="OpenAI Agent",
    tools=[OpenAITools()],
    show_tool_calls=True,
    markdown=True,
)

Usage Examples

1. Transcribing Audio

This example demonstrates an agent that transcribes an audio file.

transcription_agent.py

from pathlib import Path
from agno.agent import Agent
from agno.tools.openai import OpenAITools
from agno.utils.media import download_file

audio_url = "https://agno-public.s3.amazonaws.com/demo_data/sample_conversation.wav"

local_audio_path = Path("tmp/sample_conversation.wav")
download_file(audio_url, local_audio_path)

agent = Agent(
    name="OpenAI Transcription Agent",
    tools=[OpenAITools(transcription_model="whisper-1")],
    show_tool_calls=True,
    markdown=True,
)

agent.print_response(f"Transcribe the audio file located at '{local_audio_path}'")

2. Generating Images

This example demonstrates an agent that generates an image based on a text prompt.

image_generation_agent.py

from agno.agent import Agent
from agno.tools.openai import OpenAITools
from agno.utils.media import save_base64_data

agent = Agent(
    name="OpenAI Image Generation Agent",
    tools=[OpenAITools(image_model="dall-e-3")],
    show_tool_calls=True,
    markdown=True,
)

response = agent.run("Generate a photorealistic image of a cozy coffee shop interior")

if response.images:
    save_base64_data(response.images[0].content, "tmp/coffee_shop.png")

3. Generating Speech

This example demonstrates an agent that generates speech from text.

speech_synthesis_agent.py

from agno.agent import Agent
from agno.tools.openai import OpenAITools
from agno.utils.media import save_base64_data

agent = Agent(
    name="OpenAI Speech Agent",
    tools=[OpenAITools(
        text_to_speech_model="tts-1",
        text_to_speech_voice="alloy",
        text_to_speech_format="mp3"
    )],
    show_tool_calls=True,
    markdown=True,
)

agent.print_response("Generate audio for the text: 'Hello, this is a synthesized voice example.'")

response = agent.run_response
if response and response.audio:
    save_base64_data(response.audio[0].base64_audio, "tmp/hello.mp3")

View more examples here.

Customization

You can customize the underlying OpenAI models used for transcription, image generation, and TTS:

OpenAITools(
    transcription_model="whisper-1",
    image_model="dall-e-3",
    text_to_speech_model="tts-1-hd",
    text_to_speech_voice="nova",
    text_to_speech_format="wav"
)

Toolkit Functions

The OpenAITools toolkit provides the following functions:

Function	Description
`transcribe_audio`	Transcribes audio from a local file path or a public URL
`generate_image`	Generates images based on a text prompt
`generate_speech`	Synthesizes speech from text

Introduction

Concepts

Other

How to

Prerequisites

Initialization

Usage Examples

1. Transcribing Audio

2. Generating Images

3. Generating Speech

Customization

Toolkit Functions

Developer Resources

Introduction

Concepts

Other

How to

​Prerequisites

​Initialization

​Usage Examples

​1. Transcribing Audio

​2. Generating Images

​3. Generating Speech

​Customization

​Toolkit Functions

​Developer Resources

Prerequisites

Initialization

Usage Examples

1. Transcribing Audio

2. Generating Images

3. Generating Speech

Customization

Toolkit Functions

Developer Resources