Speech to Speech

Converts speech from one voice to another using ElevenLabs AI's voice conversion technology. This node transforms the voice in an audio file while preserving the content and timing.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

Connection Id (String) - Connection ID from the Connect node. Optional if you provide API Key directly.
Source Audio Path (String) - Path to the input audio file containing the speech to convert.
Output Path (String) - Path where the converted audio file will be saved.
Target Voice ID (String) - ID of the target voice to convert the speech to. Get voice IDs using the Get Voices node.

Options

Stability (String) - Voice stability value from 0.0 to 1.0 (default: 0.5). Higher values make the voice more consistent and predictable.
Similarity Boost (String) - Voice similarity boost from 0.0 to 1.0 (default: 0.75). Higher values make the voice more similar to the original target voice sample.
API Key - Your ElevenLabs AI API key. Optional if using Connection ID.

Outputs

This node does not have outputs. The converted audio file is saved to the specified output path.

How It Works

The Speech to Speech node converts audio from one voice to another. When executed, the node:

Validates all required inputs (source path, output path, target voice ID)
Checks that stability and similarity boost values are valid decimals between 0.0 and 1.0
Either uses the provided connection or creates a new client with direct API key
Opens the source audio file
Calls the ElevenLabs speech-to-speech API with voice settings
Streams the converted audio chunks and saves them to the output path

Requirements

Valid ElevenLabs API key (via Connect node or direct option)
Source audio file in a supported format
Valid target voice ID from ElevenLabs
Writable file path for saving converted audio
Stability and similarity boost must be decimal numbers between 0.0 and 1.0

Error Handling

The node will return specific errors in the following cases:

Missing source path - "Source Audio Path cannot be empty. Please provide the path to the input audio file."
Missing output path - "Output Path cannot be empty. Please specify where to save the converted audio."
Missing voice ID - "Target Voice ID cannot be empty. Please provide a valid ElevenLabs voice ID."
Missing stability - "Stability cannot be empty. Please provide a value between 0.0 and 1.0."
Invalid stability format - "Stability must be a valid decimal number between 0.0 and 1.0."
Missing similarity boost - "Similarity Boost cannot be empty. Please provide a value between 0.0 and 1.0."
Invalid similarity boost format - "Similarity Boost must be a valid decimal number between 0.0 and 1.0."
File not found - "Source audio file not found at: [path]. Please verify the file path is correct."
Conversion failure - "Failed to convert speech: [error details]"

Usage Notes

Stability controls voice consistency - higher values (0.7-1.0) create more predictable outputs; lower values (0.0-0.3) add variation
Similarity Boost controls how closely the output matches the target voice - higher values stay closer to the voice samples
This is different from text-to-speech - it converts existing speech to a different voice while maintaining timing and prosody
The content, words, and timing from the source audio are preserved
Only the voice characteristics are changed to match the target voice
Works with various audio formats (MP3, WAV, etc.)
Audio quality of the source file affects the quality of the output
Make sure the output path directory exists before running the node
The converted audio format depends on the ElevenLabs API default (typically MP3)

Example Use Cases

Dubbing videos with different voices
Converting voice recordings to match brand voice guidelines
Creating consistent voice across multiple audio recordings
Anonymizing voice recordings while preserving content
Adapting audio content for different audiences or regions
Converting personal voice memos to professional-sounding narration
Voice matching for film and media production
Creating character voices from regular speech recordings

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

Requirements​

Error Handling​

Usage Notes​

Example Use Cases​