Speech to Text

Converts speech audio files to text using Google Cloud Speech-to-Text API.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Google Cloud Storage (GCS) URI - The Google Cloud Storage URI of the WAV audio file to transcribe. Example: gs://your_bucket/your_folder/your_speech.wav

Result - The transcription results as an object containing the recognized text and metadata.

Sample Rate - The sample rate (in Hertz) of the audio data. Default is 16000.
Language Code - The BCP-47 language code of the language spoken in the audio. Default is en-US.
Model - The speech recognition model to use:
- Default - Best for most scenarios
- Video - Best for speech from video recordings
- Phone Call - Best for audio that originated from a phone call
- Command And Search - Best for short queries such as voice commands or voice search
Credentials - Google Cloud credentials used to authenticate with the Speech-to-Text API.
Automatic Punctuation - If enabled, adds punctuation to the transcriptions automatically.
Enable Word Time Offsets - If enabled, provides word-level timestamps that can help in aligning the text with the audio.

The Speech to Text node converts audio files to text using Google Cloud Speech-to-Text API. When executed, the node:

Validates the provided GCS URI input
Authenticates with Google Cloud Speech-to-Text API using the provided credentials
Configures the recognition parameters (language, model, sample rate, etc.)
Processes the audio file using the long-running recognition method
Returns the transcription results with word-level timing information if enabled

The node will return specific errors in the following cases:

The audio file must be stored in Google Cloud Storage and accessible via a gs:// URI
Supported audio formats include WAV, FLAC, and other formats as specified by Google Cloud Speech-to-Text
The Sample Rate should match the actual sample rate of your audio file
Different language codes support different features and accuracy levels
The Model selection should match your audio source for best results
Enabling Automatic Punctuation can improve readability of transcriptions
Word Time Offsets are useful for creating subtitles or aligning text with audio
Large audio files are processed using the long-running recognition method
The output Result contains detailed transcription data including confidence scores