Similarity

Calculate cosine similarity between embeddings to find the most relevant content based on semantic meaning.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. Default: false.

Inputs

Connection Id - Connection identifier (required for consistency).
Search Embeddings CSV - Path to CSV file with search query embedding (first row used).
Content Embeddings CSV - Path to CSV file with content embeddings to search against.
Use Robomotion AI Credits - Use Robomotion credits.

Options

Matches - Number of top similarity matches to return. Default: 5.

Outputs

Similarity - Object with two properties:
- content: Map of index to content text
- similarity: Map of index to similarity score (0-1)

How It Works

Finds the most similar content using cosine similarity:

Reads search query embedding from first CSV
Reads all content embeddings from second CSV
Calculates cosine similarity between query and each content
Sorts by similarity (highest first)
Returns top N matches

CSV Format

Expected CSV structure:

index,content,tokens,model,embedding
0,"First document","150","text-embedding-3-small","[0.023,-0.015,...]"
1,"Second document","200","text-embedding-3-small","[0.012,-0.034,...]"

Usage Example

Input:
- Search Embeddings CSV: "/tmp/query_embedding.csv"
- Content Embeddings CSV: "/tmp/knowledge_base.csv"
- Matches: 3

Output:
- Similarity: {
    content: {
      "42": "Document about password reset...",
      "17": "Guide to account recovery...",
      "89": "Security best practices..."
    },
    similarity: {
      "42": 0.92,
      "17": 0.87,
      "89": 0.83
    }
  }

Use Cases

Semantic Search: Find relevant documents based on meaning
FAQ Matching: Match user questions to similar FAQs
Content Recommendation: Suggest similar articles or products
Duplicate Detection: Find similar or duplicate content

Tips for RPA Developers

CSV Generation: First generate embeddings and save to CSV, then use this node for similarity search
Performance: Suitable for thousands of embeddings. For millions, use specialized vector databases
Similarity Scores: Scores range from 0 (unrelated) to 1 (identical)
Matches: Start with 5-10 matches for most use cases

Common Errors

"Search Embeddings CSV cannot be empty"

Provide path to CSV with query embedding

"Content Embeddings CSV does not exist"

Verify the file path is correct

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

CSV Format​

Usage Example​

Use Cases​

Tips for RPA Developers​

Common Errors​