Transcribe Audio
Transcribe audio files to text using OpenAI Whisper and GPT-4o transcription models.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. Default: false.
Inputs
- Connection Id - Connection identifier from Connect node.
- Audio File - Path to audio file. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, flac, ogg.
- Use Robomotion AI Credits - Use Robomotion credits instead of your own API key.
Options
Model Selection
- Model - Speech recognition model:
- GPT-4o Transcribe - Latest high-accuracy model (default)
- GPT-4o Mini Transcribe - Fast and efficient
- GPT-4o Transcribe (Diarize) - Identifies different speakers
- Whisper-1 - Original Whisper model
Transcription Settings
- Language - Audio language in ISO-639-1 format (e.g., "en", "es", "fr"). Optional - model auto-detects if not specified.
- Prompt - Optional text to guide transcription style, vocabulary, or context.
- Temperature - Sampling temperature (0-1). Higher values increase randomness. Default: 0.
Advanced
- Timeout (seconds) - Request timeout. Default: 120.
- Include Raw Response - Include timestamps and segments. Default: false.
Outputs
- Text - Transcribed text from the audio file.
- Raw Response - Full response with timestamps and segments (when enabled).
How It Works
Transcribes speech in audio files to text:
- Validates connection and audio file path
- Checks file format is supported
- Uploads audio to OpenAI
- Processes with selected model
- Returns transcribed text
Usage Examples
Example 1: Basic Transcription
Input:
- Audio File: "C:/recordings/meeting.mp3"
- Model: gpt-4o-transcribe
Output:
- Text: "Good morning everyone. Let's begin today's meeting..."
Example 2: Multilingual Transcription
Input:
- Audio File: "C:/audio/spanish_call.wav"
- Language: "es"
- Model: gpt-4o-transcribe
Output:
- Text: "Buenos días, gracias por llamar..."
Example 3: Speaker Diarization
Input:
- Audio File: "C:/interviews/conversation.mp3"
- Model: gpt-4o-transcribe-diarize
Output (in Raw Response):
- Segments with speaker identification
- Speaker 1: "Hello, how are you?"
- Speaker 2: "I'm doing well, thank you."
Example 4: With Context Prompt
Input:
- Audio File: "C:/medical/consultation.wav"
- Prompt: "Medical consultation discussing diabetes, insulin, and glucose levels"
- Model: gpt-4o-transcribe
Output:
- Text: More accurate medical terminology transcription
Requirements
- Connection Id from Connect node
- Valid audio file in supported format
- File must exist and be accessible
Tips for RPA Developers
- Model Selection: Use GPT-4o Transcribe for best accuracy, GPT-4o Mini for speed, Diarize for multi-speaker scenarios.
- Language Specification: Specify language for better accuracy, especially for non-English audio.
- Prompt Usage: Provide context or technical vocabulary to improve accuracy.
- File Formats: MP3 and M4A work well for most use cases. WAV is uncompressed and larger.
- Temperature: Keep at 0 for most accurate transcription. Increase for creative transcription.
- File Size: OpenAI has file size limits (typically 25MB). Compress large files if needed.
- Timestamps: Enable "Include Raw Response" to get word-level timestamps.
Common Errors
"Audio File cannot be empty"
- Provide a valid path to an audio file
"Audio file does not exist"
- Check file path is correct and file exists
"Unsupported file format"
- Convert audio to supported format (mp3, wav, etc.)