Skip to main content

Generate Speech

Convert text to natural-sounding speech using OpenAI's TTS models with multiple voice options.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. Default: false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - Connection identifier from Connect node.
  • Text - Text to convert to speech (max 4096 characters).
  • Use Robomotion AI Credits - Use Robomotion credits instead of your own API key.

Options

Model Selection

  • Model - Text-to-speech model:
    • GPT-4o Mini TTS - Latest compact TTS model with voice control
    • TTS-1 HD - High-definition audio quality (default)
    • TTS-1 - Standard quality, faster

Voice Settings

  • Voice - Voice persona to use:
    • Alloy - Neutral, versatile (default)
    • Ash - Clear, professional
    • Ballad - Soft, expressive
    • Coral - Warm, friendly
    • Cedar - Deep, authoritative
    • Echo - Resonant, distinctive
    • Fable - Narrative, storytelling
    • Marin - Bright, energetic
    • Nova - Youthful, dynamic
    • Onyx - Deep, mature
    • Sage - Calm, measured
    • Shimmer - Light, airy
    • Verse - Articulate, clear
  • Format - Audio output format:
    • MP3 - Universal compatibility (default)
    • Opus - Best compression for streaming
    • AAC - Good quality, moderate size
    • FLAC - Lossless quality, large files
    • WAV - Uncompressed, largest files
    • PCM - Raw audio data
  • Speed - Speech speed (0.25 to 4.0). Default: 1.0 (normal speed).
  • Instructions - Voice control instructions (GPT-4o Mini TTS only). Guide pronunciation, emotion, emphasis.

Advanced

  • Timeout (seconds) - Request timeout. Default: 120.
  • Include Raw Response - Include metadata in output. Default: false.

Outputs

  • Audio - File path to the generated audio file.
  • Raw Response - Metadata about the generated audio (when enabled).

How It Works

Converts text to speech using advanced TTS models:

  1. Validates connection and input text
  2. Configures voice, format, and speed
  3. Sends text to the TTS model
  4. Downloads generated audio
  5. Saves to temporary storage
  6. Returns file path

Usage Examples

Example 1: Basic Speech Generation

Input:
- Text: "Hello, this is an automated message from your RPA system."
- Voice: alloy
- Model: tts-1-hd

Output:
- Audio: "/tmp/robomotion/audio/speech.mp3"

Example 2: Different Voice and Speed

Input:
- Text: "Welcome to our premium service. How may I assist you today?"
- Voice: nova
- Speed: 1.2
- Format: opus

Output:
- Audio: "/tmp/robomotion/audio/speech.opus"

Example 3: High-Quality Podcast

Input:
- Text: "In today's episode, we'll explore the fascinating world of automation..."
- Voice: fable
- Model: tts-1-hd
- Format: flac
- Speed: 0.95

Output:
- Audio: "/tmp/robomotion/audio/speech.flac"

Example 4: Voice Instructions (GPT-4o Mini TTS)

Input:
- Text: "The temperature is 72 degrees."
- Voice: sage
- Model: gpt-4o-mini-tts
- Instructions: "Speak slowly and emphasize 'seventy-two'. Pause slightly before 'degrees'."

Output:
- Audio: "/tmp/robomotion/audio/speech.mp3" (with custom pronunciation)

Requirements

  • Connection Id from Connect node
  • Text between 1 and 4096 characters
  • Write access to temporary storage

Tips for RPA Developers

  • Voice Selection: Test different voices for your use case. Nova and Shimmer are energetic, Onyx and Cedar are authoritative.
  • Format Choice: Use MP3 for compatibility, Opus for web streaming, FLAC for archival quality.
  • Speed Adjustment: 0.9-1.1 sounds natural. Below 0.75 or above 1.5 may sound unnatural.
  • Text Length: Split long texts into chunks for better quality and faster processing.
  • Instructions: Only available with GPT-4o Mini TTS. Great for controlling pronunciation, emphasis, and pacing.
  • File Storage: Audio files are saved to temp directory. Move to permanent storage if needed.
  • Batch Processing: Process multiple texts in parallel for efficiency.

Common Errors

"Text cannot be empty"

  • Provide text content to convert to speech

"Text too long"

  • Split text into chunks of 4096 characters or less