Skip to main content

Speech to Text

Converts speech audio files to text using Google Cloud Speech-to-Text API.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Google Cloud Storage (GCS) URI - The Google Cloud Storage URI of the WAV audio file to transcribe. Example: gs://your_bucket/your_folder/your_speech.wav

Output

  • Result - The transcription results as an object containing the recognized text and metadata.

Options

  • Sample Rate - The sample rate (in Hertz) of the audio data. Default is 16000.
  • Language Code - The BCP-47 language code of the language spoken in the audio. Default is en-US.
  • Model - The speech recognition model to use:
    • Default - Best for most scenarios
    • Video - Best for speech from video recordings
    • Phone Call - Best for audio that originated from a phone call
    • Command And Search - Best for short queries such as voice commands or voice search
  • Credentials - Google Cloud credentials used to authenticate with the Speech-to-Text API.
  • Automatic Punctuation - If enabled, adds punctuation to the transcriptions automatically.
  • Enable Word Time Offsets - If enabled, provides word-level timestamps that can help in aligning the text with the audio.

How It Works

The Speech to Text node converts audio files to text using Google Cloud Speech-to-Text API. When executed, the node:

  1. Validates the provided GCS URI input
  2. Authenticates with Google Cloud Speech-to-Text API using the provided credentials
  3. Configures the recognition parameters (language, model, sample rate, etc.)
  4. Processes the audio file using the long-running recognition method
  5. Returns the transcription results with word-level timing information if enabled

Requirements

  • A valid Google Cloud Storage URI pointing to a WAV audio file
  • Valid Google Cloud credentials with Speech-to-Text API enabled
  • Proper permissions to access the Google Cloud Storage bucket
  • Audio file in WAV format (other formats may require conversion)

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid GCS URI
  • Invalid or missing Google Cloud credentials
  • Google Cloud Speech-to-Text API authentication errors
  • Insufficient permissions to access the audio file
  • Invalid audio file format
  • Google Cloud service errors
  • Network connectivity issues

Usage Notes

  • The audio file must be stored in Google Cloud Storage and accessible via a gs:// URI
  • Supported audio formats include WAV, FLAC, and other formats as specified by Google Cloud Speech-to-Text
  • The Sample Rate should match the actual sample rate of your audio file
  • Different language codes support different features and accuracy levels
  • The Model selection should match your audio source for best results
  • Enabling Automatic Punctuation can improve readability of transcriptions
  • Word Time Offsets are useful for creating subtitles or aligning text with audio
  • Large audio files are processed using the long-running recognition method
  • The output Result contains detailed transcription data including confidence scores