Skip to main content

Text To Speech

Converts text to speech audio using Google Cloud Text-to-Speech API.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Text - The text content to convert to speech. Supports both plain text and SSML (Speech Synthesis Markup Language) formats.
  • Path - The file path where the generated audio file will be saved. If empty, the file will be saved to a temporary path.

Output

  • Path - The file path where the generated audio file was saved.

Options

  • Audio Encoding - The audio file format for the generated speech:
    • Wav - Uncompressed WAV audio format
    • Mp3 - Compressed MP3 audio format
    • Ogg Opus - Compressed OGG Opus audio format
  • Language Code - The BCP-47 language code of the desired language for the speech output. Default is en-US. See Google Cloud Text-to-Speech documentation for supported languages.
  • Voice Name - The name of the voice to be used for speech synthesis. Default is en-US-Studio-M. See Google Cloud Text-to-Speech documentation for available voices.
  • Sample Rate - The sample rate (in Hertz) for the generated audio. Default is 16000.
  • Credentials - Google Cloud credentials used to authenticate with the Text-to-Speech API.
  • SSML Text - If enabled, indicates that the input text is in SSML format rather than plain text.

How It Works

The Text to Speech node converts text to audio using Google Cloud Text-to-Speech API. When executed, the node:

  1. Validates the provided text input and file path
  2. Authenticates with Google Cloud Text-to-Speech API using the provided credentials
  3. Configures the synthesis parameters (language, voice, encoding, sample rate, etc.)
  4. Processes the text using either plain text or SSML input
  5. Generates the audio file and saves it to the specified path
  6. Returns the path to the generated audio file

Requirements

  • Valid Google Cloud credentials with Text-to-Speech API enabled
  • Text content to convert to speech
  • Valid file path for saving the generated audio (or accept temporary file)

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid text input
  • Invalid or missing Google Cloud credentials
  • Google Cloud Text-to-Speech API authentication errors
  • Invalid language code or voice name
  • Invalid audio encoding format
  • Invalid file path
  • Google Cloud service errors
  • Network connectivity issues
  • File system errors when writing the audio file

Usage Notes

  • The Text input supports both plain text and SSML formats for enhanced speech control
  • When using SSML, enable the "SSML Text" option to ensure proper processing
  • Different language codes support different voices and features
  • Voice names determine the specific voice characteristics (gender, tone, accent)
  • Audio encoding affects file size and quality:
    • WAV provides highest quality but largest file size
    • MP3 provides good quality with smaller file size
    • OGG Opus provides excellent quality with efficient compression
  • The Sample Rate affects audio quality and file size
  • If no Path is specified, the audio file is saved to a temporary location
  • The output Path provides the location of the generated audio file for further processing
  • Supported languages and voices can be found in the Google Cloud Text-to-Speech documentation