Speech to Text
Converts speech audio files to text using Google Cloud Speech-to-Text API.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Google Cloud Storage (GCS) URI - The Google Cloud Storage URI of the WAV audio file to transcribe. Example: gs://your_bucket/your_folder/your_speech.wav
Output
- Result - The transcription results as an object containing the recognized text and metadata.
Options
- Sample Rate - The sample rate (in Hertz) of the audio data. Default is 16000.
- Language Code - The BCP-47 language code of the language spoken in the audio. Default is en-US.
- Model - The speech recognition model to use:
- Default - Best for most scenarios
- Video - Best for speech from video recordings
- Phone Call - Best for audio that originated from a phone call
- Command And Search - Best for short queries such as voice commands or voice search
- Credentials - Google Cloud credentials used to authenticate with the Speech-to-Text API.
- Automatic Punctuation - If enabled, adds punctuation to the transcriptions automatically.
- Enable Word Time Offsets - If enabled, provides word-level timestamps that can help in aligning the text with the audio.
How It Works
The Speech to Text node converts audio files to text using Google Cloud Speech-to-Text API. When executed, the node:
- Validates the provided GCS URI input
- Authenticates with Google Cloud Speech-to-Text API using the provided credentials
- Configures the recognition parameters (language, model, sample rate, etc.)
- Processes the audio file using the long-running recognition method
- Returns the transcription results with word-level timing information if enabled
Requirements
- A valid Google Cloud Storage URI pointing to a WAV audio file
- Valid Google Cloud credentials with Speech-to-Text API enabled
- Proper permissions to access the Google Cloud Storage bucket
- Audio file in WAV format (other formats may require conversion)
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid GCS URI
- Invalid or missing Google Cloud credentials
- Google Cloud Speech-to-Text API authentication errors
- Insufficient permissions to access the audio file
- Invalid audio file format
- Google Cloud service errors
- Network connectivity issues
Usage Notes
- The audio file must be stored in Google Cloud Storage and accessible via a gs:// URI
- Supported audio formats include WAV, FLAC, and other formats as specified by Google Cloud Speech-to-Text
- The Sample Rate should match the actual sample rate of your audio file
- Different language codes support different features and accuracy levels
- The Model selection should match your audio source for best results
- Enabling Automatic Punctuation can improve readability of transcriptions
- Word Time Offsets are useful for creating subtitles or aligning text with audio
- Large audio files are processed using the long-running recognition method
- The output Result contains detailed transcription data including confidence scores