Generate Speech

Convert text to natural-sounding speech using OpenAI's TTS models with multiple voice options.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. Default: false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

Connection Id - Connection identifier from Connect node.
Text - Text to convert to speech (max 4096 characters).
Use Robomotion AI Credits - Use Robomotion credits instead of your own API key.

Options

Model Selection

Model - Text-to-speech model:
- GPT-4o Mini TTS - Latest compact TTS model with voice control
- TTS-1 HD - High-definition audio quality (default)
- TTS-1 - Standard quality, faster

Voice Settings

Voice - Voice persona to use:
- Alloy - Neutral, versatile (default)
- Ash - Clear, professional
- Ballad - Soft, expressive
- Coral - Warm, friendly
- Cedar - Deep, authoritative
- Echo - Resonant, distinctive
- Fable - Narrative, storytelling
- Marin - Bright, energetic
- Nova - Youthful, dynamic
- Onyx - Deep, mature
- Sage - Calm, measured
- Shimmer - Light, airy
- Verse - Articulate, clear
Format - Audio output format:
- MP3 - Universal compatibility (default)
- Opus - Best compression for streaming
- AAC - Good quality, moderate size
- FLAC - Lossless quality, large files
- WAV - Uncompressed, largest files
- PCM - Raw audio data
Speed - Speech speed (0.25 to 4.0). Default: 1.0 (normal speed).
Instructions - Voice control instructions (GPT-4o Mini TTS only). Guide pronunciation, emotion, emphasis.

Advanced

Timeout (seconds) - Request timeout. Default: 120.
Include Raw Response - Include metadata in output. Default: false.

Outputs

Audio - File path to the generated audio file.
Raw Response - Metadata about the generated audio (when enabled).

How It Works

Converts text to speech using advanced TTS models:

Validates connection and input text
Configures voice, format, and speed
Sends text to the TTS model
Downloads generated audio
Saves to temporary storage
Returns file path

Usage Examples

Example 1: Basic Speech Generation

Input:
- Text: "Hello, this is an automated message from your RPA system."
- Voice: alloy
- Model: tts-1-hd

Output:
- Audio: "/tmp/robomotion/audio/speech.mp3"

Example 2: Different Voice and Speed

Input:
- Text: "Welcome to our premium service. How may I assist you today?"
- Voice: nova
- Speed: 1.2
- Format: opus

Output:
- Audio: "/tmp/robomotion/audio/speech.opus"

Example 3: High-Quality Podcast

Input:
- Text: "In today's episode, we'll explore the fascinating world of automation..."
- Voice: fable
- Model: tts-1-hd
- Format: flac
- Speed: 0.95

Output:
- Audio: "/tmp/robomotion/audio/speech.flac"

Example 4: Voice Instructions (GPT-4o Mini TTS)

Input:
- Text: "The temperature is 72 degrees."
- Voice: sage
- Model: gpt-4o-mini-tts
- Instructions: "Speak slowly and emphasize 'seventy-two'. Pause slightly before 'degrees'."

Output:
- Audio: "/tmp/robomotion/audio/speech.mp3" (with custom pronunciation)

Requirements

Connection Id from Connect node
Text between 1 and 4096 characters
Write access to temporary storage

Tips for RPA Developers

Voice Selection: Test different voices for your use case. Nova and Shimmer are energetic, Onyx and Cedar are authoritative.
Format Choice: Use MP3 for compatibility, Opus for web streaming, FLAC for archival quality.
Speed Adjustment: 0.9-1.1 sounds natural. Below 0.75 or above 1.5 may sound unnatural.
Text Length: Split long texts into chunks for better quality and faster processing.
Instructions: Only available with GPT-4o Mini TTS. Great for controlling pronunciation, emphasis, and pacing.
File Storage: Audio files are saved to temp directory. Move to permanent storage if needed.
Batch Processing: Process multiple texts in parallel for efficiency.

Common Errors

"Text cannot be empty"

Provide text content to convert to speech

"Text too long"

Split text into chunks of 4096 characters or less

Common Properties​

Inputs​

Options​

Model Selection​

Voice Settings​

Advanced​

Outputs​

How It Works​

Usage Examples​

Example 1: Basic Speech Generation​

Example 2: Different Voice and Speed​

Example 3: High-Quality Podcast​

Example 4: Voice Instructions (GPT-4o Mini TTS)​

Requirements​

Tips for RPA Developers​

Common Errors​