Text to Speech

Converts text to natural-sounding speech audio using ElevenLabs AI's advanced voice synthesis technology.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Connection Id (String) - Connection ID from the Connect node. Optional if you provide API Key directly.
Save Path (String) - File path where the generated audio will be saved (e.g., "output/speech.mp3").
Voice ID (String) - ID of the voice to use for speech synthesis. Get voice IDs using the Get Voices node.
Text (String) - The text content to convert to speech.

Stability (String) - Voice stability value from 0.0 to 1.0 (default: 0.5). Higher values make the voice more consistent and predictable.
Similarity Boost (String) - Voice similarity boost from 0.0 to 1.0 (default: 0.75). Higher values make the voice more similar to the original voice sample.
Model - Select the speech synthesis model:
- Eleven Monolingual v1 - Optimized for English language only
- Eleven Multilingual v1 - Supports multiple languages
API Key - Your ElevenLabs AI API key. Optional if using Connection ID.

This node does not have outputs. The audio file is saved to the specified path.

The Text to Speech node generates audio from text using ElevenLabs AI. When executed, the node:

Validates all required inputs (save path, voice ID, text)
Checks that stability and similarity boost values are valid decimals between 0.0 and 1.0
Either uses the provided connection or creates a new client with direct API key
Calls the ElevenLabs text-to-speech API with the specified parameters
Streams the generated audio chunks and saves them to the specified file path

The node will return specific errors in the following cases:

Missing save path - "Save Path cannot be empty. Please specify a file path to save the audio."
Missing voice ID - "Voice ID cannot be empty. Please provide a valid ElevenLabs voice ID."
Missing text - "Text cannot be empty. Please provide the text to convert to speech."
Missing stability - "Stability cannot be empty. Please provide a value between 0.0 and 1.0."
Invalid stability format - "Stability must be a valid decimal number between 0.0 and 1.0."
Missing similarity boost - "Similarity Boost cannot be empty. Please provide a value between 0.0 and 1.0."
Invalid similarity boost format - "Similarity Boost must be a valid decimal number between 0.0 and 1.0."
Generation failure - "Failed to generate speech: [error details]"

Stability controls voice consistency - higher values (0.7-1.0) create more predictable, steady voices; lower values (0.0-0.3) add more variation and emotion
Similarity Boost controls how closely the output matches the original voice - higher values stay closer to the training samples
The Monolingual v1 model is faster and more optimized for English content
The Multilingual v1 model supports various languages but may be slightly slower
Audio is streamed in chunks and written to disk, making it memory-efficient for long text
The generated audio format depends on the ElevenLabs API default (typically MP3)
Make sure the save path directory exists before running the node
Voice IDs can be obtained from the Get Voices or Get Voice nodes