Skip to main content

Speech to Speech

Converts speech from one voice to another using ElevenLabs AI's voice conversion technology. This node transforms the voice in an audio file while preserving the content and timing.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

  • Connection Id (String) - Connection ID from the Connect node. Optional if you provide API Key directly.
  • Source Audio Path (String) - Path to the input audio file containing the speech to convert.
  • Output Path (String) - Path where the converted audio file will be saved.
  • Target Voice ID (String) - ID of the target voice to convert the speech to. Get voice IDs using the Get Voices node.

Options

  • Stability (String) - Voice stability value from 0.0 to 1.0 (default: 0.5). Higher values make the voice more consistent and predictable.
  • Similarity Boost (String) - Voice similarity boost from 0.0 to 1.0 (default: 0.75). Higher values make the voice more similar to the original target voice sample.
  • API Key - Your ElevenLabs AI API key. Optional if using Connection ID.

Outputs

This node does not have outputs. The converted audio file is saved to the specified output path.

How It Works

The Speech to Speech node converts audio from one voice to another. When executed, the node:

  1. Validates all required inputs (source path, output path, target voice ID)
  2. Checks that stability and similarity boost values are valid decimals between 0.0 and 1.0
  3. Either uses the provided connection or creates a new client with direct API key
  4. Opens the source audio file
  5. Calls the ElevenLabs speech-to-speech API with voice settings
  6. Streams the converted audio chunks and saves them to the output path

Requirements

  • Valid ElevenLabs API key (via Connect node or direct option)
  • Source audio file in a supported format
  • Valid target voice ID from ElevenLabs
  • Writable file path for saving converted audio
  • Stability and similarity boost must be decimal numbers between 0.0 and 1.0

Error Handling

The node will return specific errors in the following cases:

  • Missing source path - "Source Audio Path cannot be empty. Please provide the path to the input audio file."
  • Missing output path - "Output Path cannot be empty. Please specify where to save the converted audio."
  • Missing voice ID - "Target Voice ID cannot be empty. Please provide a valid ElevenLabs voice ID."
  • Missing stability - "Stability cannot be empty. Please provide a value between 0.0 and 1.0."
  • Invalid stability format - "Stability must be a valid decimal number between 0.0 and 1.0."
  • Missing similarity boost - "Similarity Boost cannot be empty. Please provide a value between 0.0 and 1.0."
  • Invalid similarity boost format - "Similarity Boost must be a valid decimal number between 0.0 and 1.0."
  • File not found - "Source audio file not found at: [path]. Please verify the file path is correct."
  • Conversion failure - "Failed to convert speech: [error details]"

Usage Notes

  • Stability controls voice consistency - higher values (0.7-1.0) create more predictable outputs; lower values (0.0-0.3) add variation
  • Similarity Boost controls how closely the output matches the target voice - higher values stay closer to the voice samples
  • This is different from text-to-speech - it converts existing speech to a different voice while maintaining timing and prosody
  • The content, words, and timing from the source audio are preserved
  • Only the voice characteristics are changed to match the target voice
  • Works with various audio formats (MP3, WAV, etc.)
  • Audio quality of the source file affects the quality of the output
  • Make sure the output path directory exists before running the node
  • The converted audio format depends on the ElevenLabs API default (typically MP3)

Example Use Cases

  • Dubbing videos with different voices
  • Converting voice recordings to match brand voice guidelines
  • Creating consistent voice across multiple audio recordings
  • Anonymizing voice recordings while preserving content
  • Adapting audio content for different audiences or regions
  • Converting personal voice memos to professional-sounding narration
  • Voice matching for film and media production
  • Creating character voices from regular speech recordings