Skip to main content

Create Batch Embeddings

Generates vector embeddings for large amounts of text data and saves them to a CSV file.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - The connection ID for the OpenAI service.
  • Input Data - The text data to generate embeddings for, or a file path to the text data.
  • Out Embeddings CSV Path - The file path where the embeddings CSV file will be saved.

Options

  • Input Data Type - The type of input data. Options are:
    • Text - Direct text input
    • File Path - Path to a text file
  • Chunk Limit - Maximum token limit for each text chunk. Default is 2048.
  • Request Per Minute - Rate limit for API requests per minute. Default is 60.

Output

This node does not have any output variables. The embeddings are saved directly to the specified CSV file.

How It Works

The Create Batch Embeddings node processes large amounts of text data by:

  1. Validating the provided Connection Id and file paths
  2. Reading the input text data (either directly or from a file)
  3. Chunking the text into smaller pieces based on the chunk limit
  4. Generating embeddings for each text chunk using OpenAI's embedding model
  5. Saving all embeddings with their corresponding text to a CSV file

Requirements

  • A valid OpenAI API key (Robomotion Credits cannot be used with this node)
  • An active OpenAI connection
  • Input text data or a path to a text file
  • Write access to the specified output CSV file path

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid Connection Id
  • Empty or invalid Input Data
  • Empty or invalid Output Path
  • Invalid file path (file does not exist)
  • Invalid option values (chunk limit, request per minute)
  • OpenAI API errors
  • File system errors when writing the CSV file

Usage Notes

  • This node does not support Robomotion Credits, only direct OpenAI API keys
  • The node uses the text-embedding-ada-002 model for generating embeddings
  • Text is automatically chunked into smaller pieces if it exceeds the chunk limit
  • The output CSV file contains columns: index, content, content_length, content_tokens, and embedding
  • The rate limit can be adjusted to control the number of API requests per minute
  • For large datasets, processing may take some time due to API rate limits
  • The embeddings are saved in JSON format within the CSV file