Skip to main content

Similarity

Calculates similarity between text embeddings using cosine similarity and returns the most similar matches.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - The connection ID for the OpenAI service.
  • Search Embeddings CSV - File path to the CSV containing search embeddings.
  • Content Embeddings CSV - File path to the CSV containing content embeddings to compare against.

Options

  • Matches - Number of top matches to return. Default is 5.

Output

  • Similarity - The similarity results as a JSON object containing the most similar content and their similarity scores.

How It Works

The Similarity node calculates cosine similarity between text embeddings to find the most similar matches:

  1. Validates the provided Connection Id and file paths
  2. Reads the search embedding from the first row of the search embeddings CSV
  3. Reads all content embeddings from the content embeddings CSV
  4. Calculates cosine similarity between the search embedding and each content embedding
  5. Returns the top N matches (based on the Matches option) sorted by similarity score

Requirements

  • A valid OpenAI API key or Robomotion Credits
  • An active OpenAI connection
  • Valid paths to both search and content embeddings CSV files
  • Read access to the specified CSV files

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid Connection Id
  • Empty or invalid Search Embeddings CSV path
  • Empty or invalid Content Embeddings CSV path
  • File not found errors
  • Invalid CSV file format
  • Missing embedding data in CSV files

Usage Notes

  • The node uses cosine similarity to measure similarity between embeddings
  • Both CSV files should contain an "embedding" column with JSON-formatted embeddings
  • The search embeddings CSV should contain at least one row with the query embedding
  • The content embeddings CSV should contain all embeddings to compare against
  • The default number of matches returned is 5
  • The output contains both the content text and similarity scores for each match
  • Higher similarity scores indicate more similar content