Speech to Text
Transcribes audio files to text using ElevenLabs AI's speech recognition powered by OpenAI Whisper.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
Inputs
- Connection Id (String) - Connection ID from the Connect node. Optional if you provide API Key directly.
- Audio File Path (String) - Path to the audio file to transcribe (supports common formats like MP3, WAV, M4A, etc.).
Options
- Model - Speech recognition model to use:
- Whisper-1 - OpenAI Whisper v1 model for accurate speech recognition
- Language (String) - Optional language code to guide transcription (e.g., "en" for English, "es" for Spanish, "fr" for French). Leave empty for automatic language detection.
- Prompt (String) - Optional text prompt to guide the transcription style or vocabulary. Useful for technical terms or specific contexts.
- Temperature (String) - Sampling temperature from 0.0 to 1.0 (default: 0). Higher values increase randomness, lower values make output more deterministic.
- API Key - Your ElevenLabs AI API key. Optional if using Connection ID.
Outputs
- Transcription (Object) - The transcribed text and metadata, including:
- text - The transcribed text content
- language - Detected or specified language (if available)
How It Works
The Speech to Text node transcribes audio to text using AI. When executed, the node:
- Validates that the audio file path is provided
- Validates temperature if specified (must be between 0.0 and 1.0)
- Either uses the provided connection or creates a new client with direct API key
- Opens the audio file and prepares transcription parameters
- Calls the ElevenLabs speech-to-text API with optional language, prompt, and temperature
- Returns the transcribed text and metadata
Requirements
- Valid ElevenLabs API key (via Connect node or direct option)
- Audio file in a supported format (MP3, WAV, M4A, FLAC, etc.)
- Temperature value must be between 0.0 and 1.0 if specified
Error Handling
The node will return specific errors in the following cases:
- Missing audio path - "Audio File Path cannot be empty. Please provide the path to the audio file."
- Invalid temperature range - "Temperature must be between 0.0 and 1.0."
- Invalid temperature format - "Temperature must be a valid number between 0.0 and 1.0."
- File not found - "Audio file not found at: [path]. Please verify the file path is correct."
- Transcription failure - "Failed to transcribe audio: [error details]"
Usage Notes
- Language - Specifying the language can improve accuracy, but auto-detection works well for most cases
- Prompt - Use prompts to guide transcription for:
- Technical terminology
- Industry-specific vocabulary
- Proper nouns and names
- Specific formatting preferences
- Temperature - Usually best left at 0 for most accurate, deterministic results. Increase only if you need creative variations
- The Whisper model supports many languages and can auto-detect them
- Audio quality affects transcription accuracy - clear audio with minimal background noise works best
- Longer audio files may take more time to process
- The model handles various accents and speaking styles
- Supports various audio formats - common formats like MP3, WAV work well
Example Use Cases
- Transcribing meeting recordings or interviews
- Converting voice memos to text
- Creating subtitles from video audio tracks
- Extracting text from podcast episodes
- Transcribing phone calls or voicemails
- Converting audio notes to searchable text
- Accessibility features for audio content
- Analyzing customer service calls