Similarity
Calculates similarity between text embeddings using cosine similarity and returns the most similar matches.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Connection Id - The connection ID for the OpenAI service.
- Search Embeddings CSV - File path to the CSV containing search embeddings.
- Content Embeddings CSV - File path to the CSV containing content embeddings to compare against.
Options
- Matches - Number of top matches to return. Default is 5.
Output
- Similarity - The similarity results as a JSON object containing the most similar content and their similarity scores.
How It Works
The Similarity node calculates cosine similarity between text embeddings to find the most similar matches:
- Validates the provided Connection Id and file paths
- Reads the search embedding from the first row of the search embeddings CSV
- Reads all content embeddings from the content embeddings CSV
- Calculates cosine similarity between the search embedding and each content embedding
- Returns the top N matches (based on the Matches option) sorted by similarity score
Requirements
- A valid OpenAI API key or Robomotion Credits
- An active OpenAI connection
- Valid paths to both search and content embeddings CSV files
- Read access to the specified CSV files
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid Connection Id
- Empty or invalid Search Embeddings CSV path
- Empty or invalid Content Embeddings CSV path
- File not found errors
- Invalid CSV file format
- Missing embedding data in CSV files
Usage Notes
- The node uses cosine similarity to measure similarity between embeddings
- Both CSV files should contain an "embedding" column with JSON-formatted embeddings
- The search embeddings CSV should contain at least one row with the query embedding
- The content embeddings CSV should contain all embeddings to compare against
- The default number of matches returned is 5
- The output contains both the content text and similarity scores for each match
- Higher similarity scores indicate more similar content