Create Transcription

Converts audio files to text transcriptions using OpenAI's Whisper model.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Connection Id - The connection ID for the OpenAI service.
Audio File Path - The audio file path to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Language - The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency. Default is "en".
Model - The OpenAI model to use for transcription. Currently only "whisper-1" is supported.
Prompt (Optional) - An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
Response Format - The format of the transcription response. Options include:
- json
- text
- srt
- verbose_json
- vtt
Temperature - The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

The Create Transcription node uses OpenAI's Whisper model to convert speech in audio files to text. When executed, the node:

The node will return specific errors in the following cases:

This node does not support Robomotion Credits, only direct OpenAI API keys
Specifying the correct language can significantly improve transcription accuracy
Supported audio formats include: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm
The default model is "whisper-1" which is optimized for general purpose transcription
The default response format is "json"
Using a prompt can help guide the transcription style or continue from a previous segment
Temperature controls the randomness of the transcription; lower values are more deterministic