Skip to main content
An emotion-aware translation agent that translates text, analyzes the emotional tone, selects an appropriate voice, and generates localized audio output using Cartesia TTS.

Multi-Step Workflow

The agent follows a sequential workflow:
  1. Identify: Parse text and target language
  2. Translate: Convert text preserving meaning
  3. Analyze Emotion: Detect emotional tone
  4. Get Language Code: Map to 2-letter code
  5. List Voices: Get available Cartesia voices
  6. Select Voice: Choose voice matching language + emotion
  7. Localize Voice: Create language-specific clone
  8. Generate Audio: Create TTS output

Prerequisites

  • Python 3.12+
  • OpenAI API key
  • Cartesia API key

Setup

1

Clone the repository

git clone https://github.com/agno-agi/agno.git
cd agno
2

Create and activate virtual environment

uv venv --python 3.12
source .venv/bin/activate
3

Install dependencies

uv pip install -r cookbook/01_showcase/01_agents/translation_agent/requirements.in
4

Get Cartesia API key

Sign up at Cartesia to get an API key.
5

Set environment variables

export OPENAI_API_KEY=sk-***
export CARTESIA_API_KEY=your-cartesia-api-key

Run the Agent

Basic Translation

Translate text with voice generation:
python cookbook/01_showcase/01_agents/translation_agent/examples/basic_translation.py
Demonstrates:
  • Text translation
  • Voice selection
  • Audio file generation

Emotional Content

Handle emotionally nuanced text:
python cookbook/01_showcase/01_agents/translation_agent/examples/emotional_content.py
Demonstrates:
  • Emotion detection
  • Voice matching for emotional tone
  • Preserving sentiment in translation

Batch Translate

Process multiple translations:
python cookbook/01_showcase/01_agents/translation_agent/examples/batch_translate.py

Agent Configuration

translation_agent = Agent(
    name="Translation Agent",
    description="Translates text and generates localized voice notes",
    instructions=AGENT_INSTRUCTIONS,
    model=OpenAIResponses(id="gpt-5.2"),
    tools=[CartesiaTools()],
    add_datetime_to_context=True,
    add_history_to_context=True,
    num_history_runs=5,
    enable_agentic_memory=True,
    markdown=True,
)
ParameterPurpose
modelGPT-5.2 for translation and emotion analysis
instructionsStep-by-step workflow for translation
CartesiaToolsVoice listing, localization, and TTS

How It Works

Translation Workflow

1. Identify text and target language
2. Translate text accurately
3. Analyze emotion of translated text
4. Get language code (fr, es, de, etc.)
5. List available Cartesia voices
6. Select voice matching language + emotion
7. Create localized voice clone
8. Generate audio with localized voice
9. Return translated text + audio

Emotion-Voice Mapping

EmotionVoice Characteristics
NeutralClear, professional, moderate pace
HappyUpbeat, energetic, slightly faster
SadSlower, softer, lower energy
AngryStronger, more intense
ExcitedHigh energy, dynamic, faster
CalmSoothing, steady, relaxed
ProfessionalFormal, clear, authoritative

Supported Languages

LanguageCode
Frenchfr
Spanishes
Germande
Italianit
Portuguesept
Japaneseja
Chinesezh
Koreanko

Troubleshooting

Verify your API key:
echo $CARTESIA_API_KEY
Check your Cartesia dashboard for usage limits.
The agent will select the closest available voice and localize it. Some language combinations may have limited voice options.
Check the response object for audio content:
if response.audio:
    print(f"Audio bytes: {len(response.audio[0].content)}")

Source Code