Create Transcription
Converts audio files to text transcriptions using OpenAI's Whisper model.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Connection Id - The connection ID for the OpenAI service.
- Audio File Path - The audio file path to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
Options
- Language - The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency. Default is "en".
- Model - The OpenAI model to use for transcription. Currently only "whisper-1" is supported.
- Prompt (Optional) - An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
- Response Format - The format of the transcription response. Options include:
- json
- text
- srt
- verbose_json
- vtt
- Temperature - The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
Output
- Transcribed Text - The transcribed text from the audio file.
How It Works
The Create Transcription node uses OpenAI's Whisper model to convert speech in audio files to text. When executed, the node:
- Validates the provided Connection Id and audio file path
- Prepares the transcription request with the specified options
- Sends the audio file to the Whisper model for processing
- Receives the transcription result and returns it as output
Requirements
- A valid OpenAI API key (Robomotion Credits cannot be used with this node)
- An active OpenAI connection
- An audio file in a supported format
- Read access to the specified audio file
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid Connection Id
- Empty or invalid Audio File Path
- Invalid Temperature value
- OpenAI API errors
- File access errors
- Unsupported audio file format
Usage Notes
- This node does not support Robomotion Credits, only direct OpenAI API keys
- Specifying the correct language can significantly improve transcription accuracy
- Supported audio formats include: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm
- The default model is "whisper-1" which is optimized for general purpose transcription
- The default response format is "json"
- Using a prompt can help guide the transcription style or continue from a previous segment
- Temperature controls the randomness of the transcription; lower values are more deterministic