Skip to main content

Similarity

Calculate cosine similarity between embeddings to find the most relevant content based on semantic meaning.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. Default: false.

Inputs

  • Connection Id - Connection identifier (required for consistency).
  • Search Embeddings CSV - Path to CSV file with search query embedding (first row used).
  • Content Embeddings CSV - Path to CSV file with content embeddings to search against.
  • Use Robomotion AI Credits - Use Robomotion credits.

Options

  • Matches - Number of top similarity matches to return. Default: 5.

Outputs

  • Similarity - Object with two properties:
    • content: Map of index to content text
    • similarity: Map of index to similarity score (0-1)

How It Works

Finds the most similar content using cosine similarity:

  1. Reads search query embedding from first CSV
  2. Reads all content embeddings from second CSV
  3. Calculates cosine similarity between query and each content
  4. Sorts by similarity (highest first)
  5. Returns top N matches

CSV Format

Expected CSV structure:

index,content,tokens,model,embedding
0,"First document","150","text-embedding-3-small","[0.023,-0.015,...]"
1,"Second document","200","text-embedding-3-small","[0.012,-0.034,...]"

Usage Example

Input:
- Search Embeddings CSV: "/tmp/query_embedding.csv"
- Content Embeddings CSV: "/tmp/knowledge_base.csv"
- Matches: 3

Output:
- Similarity: {
content: {
"42": "Document about password reset...",
"17": "Guide to account recovery...",
"89": "Security best practices..."
},
similarity: {
"42": 0.92,
"17": 0.87,
"89": 0.83
}
}

Use Cases

  • Semantic Search: Find relevant documents based on meaning
  • FAQ Matching: Match user questions to similar FAQs
  • Content Recommendation: Suggest similar articles or products
  • Duplicate Detection: Find similar or duplicate content

Tips for RPA Developers

  • CSV Generation: First generate embeddings and save to CSV, then use this node for similarity search
  • Performance: Suitable for thousands of embeddings. For millions, use specialized vector databases
  • Similarity Scores: Scores range from 0 (unrelated) to 1 (identical)
  • Matches: Start with 5-10 matches for most use cases

Common Errors

"Search Embeddings CSV cannot be empty"

  • Provide path to CSV with query embedding

"Content Embeddings CSV does not exist"

  • Verify the file path is correct