Skip to main content

Create Transcription

Converts audio files to text transcriptions using OpenAI's Whisper model.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - The connection ID for the OpenAI service.
  • Audio File Path - The audio file path to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Options

  • Language - The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency. Default is "en".
  • Model - The OpenAI model to use for transcription. Currently only "whisper-1" is supported.
  • Prompt (Optional) - An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
  • Response Format - The format of the transcription response. Options include:
    • json
    • text
    • srt
    • verbose_json
    • vtt
  • Temperature - The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Output

  • Transcribed Text - The transcribed text from the audio file.

How It Works

The Create Transcription node uses OpenAI's Whisper model to convert speech in audio files to text. When executed, the node:

  1. Validates the provided Connection Id and audio file path
  2. Prepares the transcription request with the specified options
  3. Sends the audio file to the Whisper model for processing
  4. Receives the transcription result and returns it as output

Requirements

  • A valid OpenAI API key (Robomotion Credits cannot be used with this node)
  • An active OpenAI connection
  • An audio file in a supported format
  • Read access to the specified audio file

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid Connection Id
  • Empty or invalid Audio File Path
  • Invalid Temperature value
  • OpenAI API errors
  • File access errors
  • Unsupported audio file format

Usage Notes

  • This node does not support Robomotion Credits, only direct OpenAI API keys
  • Specifying the correct language can significantly improve transcription accuracy
  • Supported audio formats include: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm
  • The default model is "whisper-1" which is optimized for general purpose transcription
  • The default response format is "json"
  • Using a prompt can help guide the transcription style or continue from a previous segment
  • Temperature controls the randomness of the transcription; lower values are more deterministic