Multi-Step Workflow
The agent follows a sequential workflow:- Identify: Parse text and target language
- Translate: Convert text preserving meaning
- Analyze Emotion: Detect emotional tone
- Get Language Code: Map to 2-letter code
- List Voices: Get available Cartesia voices
- Select Voice: Choose voice matching language + emotion
- Localize Voice: Create language-specific clone
- Generate Audio: Create TTS output
Prerequisites
- Python 3.12+
- OpenAI API key
- Cartesia API key
Setup
1
Clone the repository
2
Create and activate virtual environment
3
Install dependencies
4
Get Cartesia API key
Sign up at Cartesia to get an API key.
5
Set environment variables
Run the Agent
Basic Translation
Translate text with voice generation:- Text translation
- Voice selection
- Audio file generation
Emotional Content
Handle emotionally nuanced text:- Emotion detection
- Voice matching for emotional tone
- Preserving sentiment in translation
Batch Translate
Process multiple translations:Agent Configuration
| Parameter | Purpose |
|---|---|
model | GPT-5.2 for translation and emotion analysis |
instructions | Step-by-step workflow for translation |
CartesiaTools | Voice listing, localization, and TTS |
How It Works
Translation Workflow
Emotion-Voice Mapping
| Emotion | Voice Characteristics |
|---|---|
| Neutral | Clear, professional, moderate pace |
| Happy | Upbeat, energetic, slightly faster |
| Sad | Slower, softer, lower energy |
| Angry | Stronger, more intense |
| Excited | High energy, dynamic, faster |
| Calm | Soothing, steady, relaxed |
| Professional | Formal, clear, authoritative |
Supported Languages
| Language | Code |
|---|---|
| French | fr |
| Spanish | es |
| German | de |
| Italian | it |
| Portuguese | pt |
| Japanese | ja |
| Chinese | zh |
| Korean | ko |
Troubleshooting
Cartesia API errors
Cartesia API errors
Verify your API key:Check your Cartesia dashboard for usage limits.
Voice not available for language
Voice not available for language
The agent will select the closest available voice and localize it. Some language combinations may have limited voice options.
Audio not generated
Audio not generated
Check the response object for audio content: