Similarity
Calculate cosine similarity between embeddings to find the most relevant content based on semantic meaning.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. Default: false.
Inputs
- Connection Id - Connection identifier (required for consistency).
- Search Embeddings CSV - Path to CSV file with search query embedding (first row used).
- Content Embeddings CSV - Path to CSV file with content embeddings to search against.
- Use Robomotion AI Credits - Use Robomotion credits.
Options
- Matches - Number of top similarity matches to return. Default: 5.
Outputs
- Similarity - Object with two properties:
- content: Map of index to content text
- similarity: Map of index to similarity score (0-1)
How It Works
Finds the most similar content using cosine similarity:
- Reads search query embedding from first CSV
- Reads all content embeddings from second CSV
- Calculates cosine similarity between query and each content
- Sorts by similarity (highest first)
- Returns top N matches
CSV Format
Expected CSV structure:
index,content,tokens,model,embedding
0,"First document","150","text-embedding-3-small","[0.023,-0.015,...]"
1,"Second document","200","text-embedding-3-small","[0.012,-0.034,...]"
Usage Example
Input:
- Search Embeddings CSV: "/tmp/query_embedding.csv"
- Content Embeddings CSV: "/tmp/knowledge_base.csv"
- Matches: 3
Output:
- Similarity: {
content: {
"42": "Document about password reset...",
"17": "Guide to account recovery...",
"89": "Security best practices..."
},
similarity: {
"42": 0.92,
"17": 0.87,
"89": 0.83
}
}
Use Cases
- Semantic Search: Find relevant documents based on meaning
- FAQ Matching: Match user questions to similar FAQs
- Content Recommendation: Suggest similar articles or products
- Duplicate Detection: Find similar or duplicate content
Tips for RPA Developers
- CSV Generation: First generate embeddings and save to CSV, then use this node for similarity search
- Performance: Suitable for thousands of embeddings. For millions, use specialized vector databases
- Similarity Scores: Scores range from 0 (unrelated) to 1 (identical)
- Matches: Start with 5-10 matches for most use cases
Common Errors
"Search Embeddings CSV cannot be empty"
- Provide path to CSV with query embedding
"Content Embeddings CSV does not exist"
- Verify the file path is correct