Skip to main content

Compare Embeddings

Compares embeddings of texts to find similarities using various similarity metrics.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - The connection ID obtained from the Connect node.
  • Source Text - Primary text to compare against others.
  • Comparison Texts - Array of texts to compare with source.
  • Source Embedding File - Path to JSON file with pre-computed source embedding (optional).
  • Target Embeddings File - Path to JSON file with pre-computed target embeddings array (optional).
  • Comparison Texts File - Path to text file with comparison texts (one per line, optional).

Options

  • Embedding Model - The model to use for generating embeddings. Options include:
    • Text Embedding 004
    • Text Embedding 005
    • Multilingual Embedding 002
    • Custom Model
  • Custom Model - Custom model name when "Custom Model" is selected for Embedding Model.
  • Similarity Metric - Method for calculating similarity. Options include:
    • Cosine Similarity
    • Dot Product
    • Euclidean Distance
  • Similarity Threshold - Minimum similarity score (0.0-1.0, filters results).
  • Max Results - Maximum number of results to return.
  • Sort Order - Sort results by similarity score. Options are:
    • Highest First
    • Lowest First
  • Timeout (seconds) - Request timeout in seconds (default: 60).

Output

  • Similarities - The similarity results as structured data with scores and metadata.
  • Results File Path - Path to the file containing detailed results in JSON format.

How It Works

The Compare Embeddings node calculates similarity scores between a source text and multiple comparison texts using Google's Gemini API. When executed, the node:

  1. Validates the provided connection ID and input texts or files
  2. Configures the embedding model based on the selected options
  3. Loads or generates embeddings for the source text and comparison texts
  4. Calculates similarity scores using the specified metric (cosine, dot product, or euclidean)
  5. Filters results based on the similarity threshold if provided
  6. Sorts results according to the specified sort order
  7. Limits the number of results if a maximum is specified
  8. Saves detailed results to a JSON file and returns a summary

Requirements

  • A valid Google Gemini API key
  • Connection ID from a successful Connect node execution
  • Either source text or a source embedding file
  • Either comparison texts or a target embeddings file

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid Connection ID
  • Missing source text or embedding file
  • Missing comparison texts or embeddings file
  • Invalid similarity threshold value (must be between 0.0 and 1.0)
  • Invalid max results value (must be at least 1)
  • Invalid custom model name
  • File I/O errors when reading embedding files
  • API errors from Google's Gemini service

Usage Notes

  • You can provide texts directly or load pre-computed embeddings from JSON files
  • Comparison texts can also be loaded from a text file (one text per line)
  • Cosine similarity is the default and most commonly used metric
  • Results are automatically saved to a JSON file in a temporary directory
  • The node supports timeout configuration for long-running operations
  • For large datasets, consider using pre-computed embedding files to improve performance