Generate Speech
Convert text to natural-sounding speech using OpenAI's TTS models with multiple voice options.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. Default: false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Connection Id - Connection identifier from Connect node.
- Text - Text to convert to speech (max 4096 characters).
- Use Robomotion AI Credits - Use Robomotion credits instead of your own API key.
Options
Model Selection
- Model - Text-to-speech model:
- GPT-4o Mini TTS - Latest compact TTS model with voice control
- TTS-1 HD - High-definition audio quality (default)
- TTS-1 - Standard quality, faster
Voice Settings
- Voice - Voice persona to use:
- Alloy - Neutral, versatile (default)
- Ash - Clear, professional
- Ballad - Soft, expressive
- Coral - Warm, friendly
- Cedar - Deep, authoritative
- Echo - Resonant, distinctive
- Fable - Narrative, storytelling
- Marin - Bright, energetic
- Nova - Youthful, dynamic
- Onyx - Deep, mature
- Sage - Calm, measured
- Shimmer - Light, airy
- Verse - Articulate, clear
- Format - Audio output format:
- MP3 - Universal compatibility (default)
- Opus - Best compression for streaming
- AAC - Good quality, moderate size
- FLAC - Lossless quality, large files
- WAV - Uncompressed, largest files
- PCM - Raw audio data
- Speed - Speech speed (0.25 to 4.0). Default: 1.0 (normal speed).
- Instructions - Voice control instructions (GPT-4o Mini TTS only). Guide pronunciation, emotion, emphasis.
Advanced
- Timeout (seconds) - Request timeout. Default: 120.
- Include Raw Response - Include metadata in output. Default: false.
Outputs
- Audio - File path to the generated audio file.
- Raw Response - Metadata about the generated audio (when enabled).
How It Works
Converts text to speech using advanced TTS models:
- Validates connection and input text
- Configures voice, format, and speed
- Sends text to the TTS model
- Downloads generated audio
- Saves to temporary storage
- Returns file path
Usage Examples
Example 1: Basic Speech Generation
Input:
- Text: "Hello, this is an automated message from your RPA system."
- Voice: alloy
- Model: tts-1-hd
Output:
- Audio: "/tmp/robomotion/audio/speech.mp3"
Example 2: Different Voice and Speed
Input:
- Text: "Welcome to our premium service. How may I assist you today?"
- Voice: nova
- Speed: 1.2
- Format: opus
Output:
- Audio: "/tmp/robomotion/audio/speech.opus"
Example 3: High-Quality Podcast
Input:
- Text: "In today's episode, we'll explore the fascinating world of automation..."
- Voice: fable
- Model: tts-1-hd
- Format: flac
- Speed: 0.95
Output:
- Audio: "/tmp/robomotion/audio/speech.flac"
Example 4: Voice Instructions (GPT-4o Mini TTS)
Input:
- Text: "The temperature is 72 degrees."
- Voice: sage
- Model: gpt-4o-mini-tts
- Instructions: "Speak slowly and emphasize 'seventy-two'. Pause slightly before 'degrees'."
Output:
- Audio: "/tmp/robomotion/audio/speech.mp3" (with custom pronunciation)
Requirements
- Connection Id from Connect node
- Text between 1 and 4096 characters
- Write access to temporary storage
Tips for RPA Developers
- Voice Selection: Test different voices for your use case. Nova and Shimmer are energetic, Onyx and Cedar are authoritative.
- Format Choice: Use MP3 for compatibility, Opus for web streaming, FLAC for archival quality.
- Speed Adjustment: 0.9-1.1 sounds natural. Below 0.75 or above 1.5 may sound unnatural.
- Text Length: Split long texts into chunks for better quality and faster processing.
- Instructions: Only available with GPT-4o Mini TTS. Great for controlling pronunciation, emphasis, and pacing.
- File Storage: Audio files are saved to temp directory. Move to permanent storage if needed.
- Batch Processing: Process multiple texts in parallel for efficiency.
Common Errors
"Text cannot be empty"
- Provide text content to convert to speech
"Text too long"
- Split text into chunks of 4096 characters or less