Text To Speech
Converts text to speech audio using Google Cloud Text-to-Speech API.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Text - The text content to convert to speech. Supports both plain text and SSML (Speech Synthesis Markup Language) formats.
- Path - The file path where the generated audio file will be saved. If empty, the file will be saved to a temporary path.
Output
- Path - The file path where the generated audio file was saved.
Options
- Audio Encoding - The audio file format for the generated speech:
- Wav - Uncompressed WAV audio format
- Mp3 - Compressed MP3 audio format
- Ogg Opus - Compressed OGG Opus audio format
- Language Code - The BCP-47 language code of the desired language for the speech output. Default is en-US. See Google Cloud Text-to-Speech documentation for supported languages.
- Voice Name - The name of the voice to be used for speech synthesis. Default is en-US-Studio-M. See Google Cloud Text-to-Speech documentation for available voices.
- Sample Rate - The sample rate (in Hertz) for the generated audio. Default is 16000.
- Credentials - Google Cloud credentials used to authenticate with the Text-to-Speech API.
- SSML Text - If enabled, indicates that the input text is in SSML format rather than plain text.
How It Works
The Text to Speech node converts text to audio using Google Cloud Text-to-Speech API. When executed, the node:
- Validates the provided text input and file path
- Authenticates with Google Cloud Text-to-Speech API using the provided credentials
- Configures the synthesis parameters (language, voice, encoding, sample rate, etc.)
- Processes the text using either plain text or SSML input
- Generates the audio file and saves it to the specified path
- Returns the path to the generated audio file
Requirements
- Valid Google Cloud credentials with Text-to-Speech API enabled
- Text content to convert to speech
- Valid file path for saving the generated audio (or accept temporary file)
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid text input
- Invalid or missing Google Cloud credentials
- Google Cloud Text-to-Speech API authentication errors
- Invalid language code or voice name
- Invalid audio encoding format
- Invalid file path
- Google Cloud service errors
- Network connectivity issues
- File system errors when writing the audio file
Usage Notes
- The Text input supports both plain text and SSML formats for enhanced speech control
- When using SSML, enable the "SSML Text" option to ensure proper processing
- Different language codes support different voices and features
- Voice names determine the specific voice characteristics (gender, tone, accent)
- Audio encoding affects file size and quality:
- WAV provides highest quality but largest file size
- MP3 provides good quality with smaller file size
- OGG Opus provides excellent quality with efficient compression
- The Sample Rate affects audio quality and file size
- If no Path is specified, the audio file is saved to a temporary location
- The output Path provides the location of the generated audio file for further processing
- Supported languages and voices can be found in the Google Cloud Text-to-Speech documentation