Speech to Text

Transcribes audio files to text using ElevenLabs AI's speech recognition powered by OpenAI Whisper.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Connection Id (String) - Connection ID from the Connect node. Optional if you provide API Key directly.
Audio File Path (String) - Path to the audio file to transcribe (supports common formats like MP3, WAV, M4A, etc.).

Model - Speech recognition model to use:
- Whisper-1 - OpenAI Whisper v1 model for accurate speech recognition
Language (String) - Optional language code to guide transcription (e.g., "en" for English, "es" for Spanish, "fr" for French). Leave empty for automatic language detection.
Prompt (String) - Optional text prompt to guide the transcription style or vocabulary. Useful for technical terms or specific contexts.
Temperature (String) - Sampling temperature from 0.0 to 1.0 (default: 0). Higher values increase randomness, lower values make output more deterministic.
API Key - Your ElevenLabs AI API key. Optional if using Connection ID.

Transcription (Object) - The transcribed text and metadata, including:
- text - The transcribed text content
- language - Detected or specified language (if available)

The Speech to Text node transcribes audio to text using AI. When executed, the node:

Validates that the audio file path is provided
Validates temperature if specified (must be between 0.0 and 1.0)
Either uses the provided connection or creates a new client with direct API key
Opens the audio file and prepares transcription parameters
Calls the ElevenLabs speech-to-text API with optional language, prompt, and temperature
Returns the transcribed text and metadata

The node will return specific errors in the following cases:

Missing audio path - "Audio File Path cannot be empty. Please provide the path to the audio file."
Invalid temperature range - "Temperature must be between 0.0 and 1.0."
Invalid temperature format - "Temperature must be a valid number between 0.0 and 1.0."
File not found - "Audio file not found at: [path]. Please verify the file path is correct."
Transcription failure - "Failed to transcribe audio: [error details]"

Language - Specifying the language can improve accuracy, but auto-detection works well for most cases
Prompt - Use prompts to guide transcription for:
- Technical terminology
- Industry-specific vocabulary
- Proper nouns and names
- Specific formatting preferences
Temperature - Usually best left at 0 for most accurate, deterministic results. Increase only if you need creative variations
The Whisper model supports many languages and can auto-detect them
Audio quality affects transcription accuracy - clear audio with minimal background noise works best
Longer audio files may take more time to process
The model handles various accents and speaking styles
Supports various audio formats - common formats like MP3, WAV work well