Transcribe Audio

Transcribe audio files to text using OpenAI Whisper and GPT-4o transcription models.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. Default: false.

Inputs

Connection Id - Connection identifier from Connect node.
Audio File - Path to audio file. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, flac, ogg.
Use Robomotion AI Credits - Use Robomotion credits instead of your own API key.

Options

Model Selection

Model - Speech recognition model:
- GPT-4o Transcribe - Latest high-accuracy model (default)
- GPT-4o Mini Transcribe - Fast and efficient
- GPT-4o Transcribe (Diarize) - Identifies different speakers
- Whisper-1 - Original Whisper model

Transcription Settings

Language - Audio language in ISO-639-1 format (e.g., "en", "es", "fr"). Optional - model auto-detects if not specified.
Prompt - Optional text to guide transcription style, vocabulary, or context.
Temperature - Sampling temperature (0-1). Higher values increase randomness. Default: 0.

Advanced

Timeout (seconds) - Request timeout. Default: 120.
Include Raw Response - Include timestamps and segments. Default: false.

Outputs

Text - Transcribed text from the audio file.
Raw Response - Full response with timestamps and segments (when enabled).

How It Works

Transcribes speech in audio files to text:

Validates connection and audio file path
Checks file format is supported
Uploads audio to OpenAI
Processes with selected model
Returns transcribed text

Usage Examples

Example 1: Basic Transcription

Input:
- Audio File: "C:/recordings/meeting.mp3"
- Model: gpt-4o-transcribe

Output:
- Text: "Good morning everyone. Let's begin today's meeting..."

Example 2: Multilingual Transcription

Input:
- Audio File: "C:/audio/spanish_call.wav"
- Language: "es"
- Model: gpt-4o-transcribe

Output:
- Text: "Buenos días, gracias por llamar..."

Example 3: Speaker Diarization

Input:
- Audio File: "C:/interviews/conversation.mp3"
- Model: gpt-4o-transcribe-diarize

Output (in Raw Response):
- Segments with speaker identification
- Speaker 1: "Hello, how are you?"
- Speaker 2: "I'm doing well, thank you."

Example 4: With Context Prompt

Input:
- Audio File: "C:/medical/consultation.wav"
- Prompt: "Medical consultation discussing diabetes, insulin, and glucose levels"
- Model: gpt-4o-transcribe

Output:
- Text: More accurate medical terminology transcription

Requirements

Connection Id from Connect node
Valid audio file in supported format
File must exist and be accessible

Tips for RPA Developers

Model Selection: Use GPT-4o Transcribe for best accuracy, GPT-4o Mini for speed, Diarize for multi-speaker scenarios.
Language Specification: Specify language for better accuracy, especially for non-English audio.
Prompt Usage: Provide context or technical vocabulary to improve accuracy.
File Formats: MP3 and M4A work well for most use cases. WAV is uncompressed and larger.
Temperature: Keep at 0 for most accurate transcription. Increase for creative transcription.
File Size: OpenAI has file size limits (typically 25MB). Compress large files if needed.
Timestamps: Enable "Include Raw Response" to get word-level timestamps.

Common Errors

"Audio File cannot be empty"

Provide a valid path to an audio file

"Audio file does not exist"

Check file path is correct and file exists

"Unsupported file format"

Convert audio to supported format (mp3, wav, etc.)

Common Properties​

Inputs​

Options​

Model Selection​

Transcription Settings​

Advanced​

Outputs​

How It Works​

Usage Examples​

Example 1: Basic Transcription​

Example 2: Multilingual Transcription​

Example 3: Speaker Diarization​

Example 4: With Context Prompt​

Requirements​

Tips for RPA Developers​

Common Errors​