Skip to main content

Cluster Keywords

Groups similar keywords together using various distance metrics and clustering algorithms. This node helps organize large keyword lists into meaningful groups, identify keyword themes, and reduce redundancy in keyword research.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Keywords - Array of keywords to cluster based on similarity. Example:
    ["seo optimization", "search engine optimization", "content marketing", "content strategy", "digital marketing", "marketing automation"]

Options

  • Distance Method - Method to calculate distances between keywords. Options:

    • Normalized Edit Distance - Measures string similarity based on character-level differences (Levenshtein distance). Good for finding typos and variations.
    • Jaccard Similarity - Measures overlap between word sets. Good for finding keywords sharing common words.
    • Cosine Similarity - Measures angular similarity between keyword vectors. Good for semantic similarity.

    Default: Cosine Similarity.

  • Distance Vectorizer - Method to vectorize keywords when using Jaccard or Cosine similarity. Options:

    • Counter - Uses raw word counts
    • TF-IDF - Uses TF-IDF weighted vectors (better for longer phrases)

    Default: Counter.

  • Clustering Method - Clustering algorithm to use. Options:

    • Agglomerative Clustering - Hierarchical clustering that merges similar keywords. Works with Distance Threshold parameter.
    • K-Means Clustering - Partitions keywords into K clusters. Requires Number of Clusters parameter.
    • DBSCAN Clustering - Density-based clustering that can find arbitrary shaped clusters. Works with Epsilon parameter.

    Default: K-Means Clustering.

  • Distance Threshold - Threshold for Agglomerative Clustering. Lower values result in more clusters. Keywords with distance below this threshold are grouped together. Default: 0.5.

  • Number of Clusters for K-Means - Number of clusters to create when using K-Means clustering. Default: 5.

  • Epsilon for DBSCAN - Epsilon parameter for DBSCAN clustering. Maximum distance between keywords to be considered in the same cluster. Default: 0.3.

  • Ordered Keys Only - When enabled, returns only the root keywords (cluster representatives) as an array ordered by their original position. When disabled, returns full cluster structure with all keywords. Default: false.

Output

  • Clusters - Clustering results in one of two formats:

    When Ordered Keys Only = false (default):

    {
    "seo optimization": ["search engine optimization", "seo tips"],
    "content marketing": ["content strategy", "content creation"],
    "digital marketing": ["marketing automation", "online marketing"]
    }

    Each key is the root keyword (first occurrence), and the value is an array of similar keywords in that cluster.

    When Ordered Keys Only = true:

    ["seo optimization", "content marketing", "digital marketing"]

    Returns only root keywords ordered by their position in the input array.

How It Works

The Cluster Keywords node uses advanced similarity metrics and clustering algorithms:

  1. Distance Calculation

    • Computes similarity/distance between all keyword pairs
    • Creates a distance matrix based on the selected method
  2. Clustering

    • Applies the selected clustering algorithm
    • Groups keywords based on their distances
  3. Cluster Formation

    • Identifies root keywords (earliest occurrence in each cluster)
    • Groups remaining keywords under their respective roots
  4. Result Formatting

    • Returns either full clusters or just root keywords
    • Maintains original keyword order for consistency

Practical Examples

Example 1: Basic Keyword Grouping

Organize a list of related keywords:

// Extract keywords from TF-IDF or Frequency Analysis
msg.keywords = [
"machine learning",
"artificial intelligence",
"deep learning",
"neural networks",
"ai technology",
"ml algorithms",
"data science",
"predictive analytics"
];

// Configure Cluster Keywords node:
// Distance Method: Cosine Similarity
// Clustering Method: K-Means
// Number of Clusters: 3

// After Cluster Keywords
const clusters = msg.clusters;

console.log("Keyword Clusters:");
for (let rootKeyword in clusters) {
console.log(`\n${rootKeyword}:`);
clusters[rootKeyword].forEach(kw => {
console.log(` - ${kw}`);
});
}

Example 2: Deduplicating Similar Keywords

Remove redundant keywords from a large list:

// Large keyword list from research
msg.keywords = [
"seo optimization",
"search engine optimization",
"seo optimisation", // British spelling
"optimize for seo",
"content marketing",
"marketing content",
"content marketing strategy"
];

// Configure Cluster Keywords:
// Distance Method: Normalized Edit Distance (to catch spelling variations)
// Clustering Method: Agglomerative
// Distance Threshold: 0.3 (stricter grouping)

// After Cluster Keywords
const clusters = msg.clusters;

// Extract only root keywords (de-duplicated list)
msg.uniqueKeywords = Object.keys(clusters);

console.log("Unique Keywords:", msg.uniqueKeywords);

// Show what was grouped
for (let root in clusters) {
if (clusters[root].length > 0) {
console.log(`"${root}" represents: ${clusters[root].join(", ")}`);
}
}

Example 3: Topic Identification

Identify main topics from a large keyword set:

// Keywords extracted from multiple blog posts
msg.keywords = [
"python programming", "javascript coding", "web development",
"react framework", "vue.js", "frontend development",
"data analysis", "pandas library", "data visualization",
"machine learning", "scikit-learn", "tensorflow",
"api development", "rest api", "backend development"
];

// Configure Cluster Keywords:
// Distance Method: Cosine Similarity
// Clustering Method: K-Means
// Number of Clusters: 4
// Ordered Keys Only: false

// After Cluster Keywords
const clusters = msg.clusters;

// Analyze clusters to identify topics
msg.topics = [];
for (let rootKeyword in clusters) {
const clusterSize = clusters[rootKeyword].length + 1; // +1 for root
const allKeywords = [rootKeyword, ...clusters[rootKeyword]];

msg.topics.push({
topic: rootKeyword,
relatedTerms: allKeywords,
termCount: clusterSize
});
}

// Sort by cluster size to identify major topics
msg.topics.sort((a, b) => b.termCount - a.termCount);

console.log("Main Topics:");
msg.topics.forEach((topic, index) => {
console.log(`\n${index + 1}. ${topic.topic} (${topic.termCount} related terms)`);
console.log(` Related: ${topic.relatedTerms.slice(1).join(", ")}`);
});

Example 4: Content Planning from Clustered Keywords

Create content topics from keyword clusters:

// Keywords from keyword research
msg.keywords = msg.researchedKeywords; // Large array from previous analysis

// Configure Cluster Keywords:
// Distance Method: Cosine Similarity
// Clustering Method: Agglomerative
// Distance Threshold: 0.4

// After Cluster Keywords
const clusters = msg.clusters;

// Generate content ideas
msg.contentPlan = [];

for (let mainKeyword in clusters) {
const relatedKeywords = clusters[mainKeyword];
const totalKeywords = [mainKeyword, ...relatedKeywords];

// Create content plan item
msg.contentPlan.push({
title: `How to ${mainKeyword}`,
primaryKeyword: mainKeyword,
secondaryKeywords: relatedKeywords.slice(0, 5), // Top 5 related
estimatedLength: totalKeywords.length * 150, // 150 words per keyword
priority: totalKeywords.length > 5 ? "High" : "Medium"
});
}

// Sort by priority
msg.contentPlan.sort((a, b) => {
const priority = { High: 2, Medium: 1 };
return priority[b.priority] - priority[a.priority];
});

console.log("Content Plan:");
msg.contentPlan.forEach((item, index) => {
console.log(`\n${index + 1}. ${item.title}`);
console.log(` Primary: ${item.primaryKeyword}`);
console.log(` Related: ${item.secondaryKeywords.join(", ")}`);
console.log(` Priority: ${item.priority}`);
});

Example 5: Comparing Clustering Methods

Find the best clustering method for your data:

msg.keywords = msg.myKeywordList;

// Try different methods
const methods = [
{ distance: "cosine", clustering: "kmeans", params: { clusters: 5 } },
{ distance: "cosine", clustering: "agglomerative", params: { threshold: 0.5 } },
{ distance: "normalized_edit", clustering: "dbscan", params: { epsilon: 0.3 } }
];

msg.methodResults = [];

for (let method of methods) {
// Configure and run Cluster Keywords with each method
// (In practice, use separate nodes or manual configuration)

// After each Cluster Keywords run
const clusters = msg.clusters;
const numClusters = Object.keys(clusters).length;
const avgClusterSize = Object.values(clusters).reduce(
(sum, cluster) => sum + cluster.length + 1, 0
) / numClusters;

msg.methodResults.push({
method: `${method.distance} + ${method.clustering}`,
clusterCount: numClusters,
avgSize: avgClusterSize.toFixed(2),
results: clusters
});
}

console.log("Clustering Method Comparison:");
msg.methodResults.forEach(result => {
console.log(`\n${result.method}:`);
console.log(` Clusters: ${result.clusterCount}`);
console.log(` Avg cluster size: ${result.avgSize}`);
});

Example 6: Creating Keyword Hierarchies

Build a hierarchical structure from keywords:

msg.keywords = msg.allKeywords;

// First pass: Create broad clusters
// Distance Method: Cosine Similarity
// Clustering Method: K-Means
// Number of Clusters: 3
// Ordered Keys Only: false

// After first Cluster Keywords
const broadClusters = msg.clusters;

msg.hierarchy = [];

// Second pass: Cluster within each broad cluster
for (let mainTopic in broadClusters) {
const subKeywords = [mainTopic, ...broadClusters[mainTopic]];

if (subKeywords.length > 5) {
// Re-cluster large groups
msg.keywords = subKeywords;

// Run another Cluster Keywords with different settings
// Distance Method: Normalized Edit Distance
// Clustering Method: Agglomerative
// Distance Threshold: 0.3

// After second Cluster Keywords
const subClusters = msg.clusters;

msg.hierarchy.push({
mainTopic: mainTopic,
subTopics: subClusters
});
} else {
msg.hierarchy.push({
mainTopic: mainTopic,
subTopics: { [mainTopic]: subKeywords.slice(1) }
});
}
}

console.log("Keyword Hierarchy:");
msg.hierarchy.forEach(item => {
console.log(`\n${item.mainTopic}`);
for (let subTopic in item.subTopics) {
console.log(` └─ ${subTopic}`);
item.subTopics[subTopic].forEach(keyword => {
console.log(` ├─ ${keyword}`);
});
}
});

Tips for Effective Use

  1. Choosing Distance Method

    • Normalized Edit Distance: Best for finding typos, spelling variations, similar strings
    • Jaccard Similarity: Good for keywords sharing common words (e.g., "digital marketing" and "marketing automation")
    • Cosine Similarity: Best for semantic similarity and general-purpose clustering
  2. Choosing Clustering Method

    • Agglomerative: Good when you want control via threshold; produces hierarchical clusters
    • K-Means: Best when you know how many clusters you want; fast and reliable
    • DBSCAN: Good for finding natural groupings; can identify outliers (noise keywords)
  3. Parameter Tuning

    • Start with default values and adjust based on results
    • Lower Distance Threshold = more, smaller clusters
    • More K-Means clusters = smaller, more specific groups
    • Lower DBSCAN Epsilon = stricter clustering, more noise
  4. Input Preparation

    • Clean keywords (remove duplicates, excess whitespace)
    • Use consistent formatting (all lowercase recommended)
    • More keywords = better clustering (minimum 20 recommended)
  5. Result Interpretation

    • Check cluster sizes - very large clusters may need re-clustering
    • Single-keyword clusters might be outliers or unique topics
    • Root keywords represent the "canonical" form of the cluster

Common Errors and Solutions

Issue: All Keywords in One Cluster

Cause: Clustering parameters are too loose; keywords are too similar.

Solution:

  • Lower Distance Threshold (for Agglomerative)
  • Increase Number of Clusters (for K-Means)
  • Lower Epsilon (for DBSCAN)
  • Try a different Distance Method

Issue: Every Keyword is Its Own Cluster

Cause: Clustering parameters are too strict; keywords are too different.

Solution:

  • Increase Distance Threshold (for Agglomerative)
  • Decrease Number of Clusters (for K-Means)
  • Increase Epsilon (for DBSCAN)
  • Switch to Cosine Similarity for more flexible matching

Issue: Poor Clustering Quality

Cause: Wrong distance method or vectorizer for your keyword type.

Solution:

  • For short keywords (1-2 words): Use Normalized Edit Distance or Jaccard + Counter
  • For longer phrases (3+ words): Use Cosine Similarity + TF-IDF
  • Experiment with different combinations

Issue: DBSCAN Creates Too Much Noise

Cause: Epsilon value is too low; keywords are too diverse.

Solution:

  • Increase Epsilon parameter gradually
  • Or switch to K-Means or Agglomerative clustering
  • Consider filtering keywords before clustering

Understanding Clustering Algorithms

Agglomerative Clustering

  • How it works: Starts with each keyword as its own cluster, then merges similar clusters
  • Best for: When you want precise control over similarity threshold
  • Parameters: Distance Threshold (lower = more clusters)

K-Means Clustering

  • How it works: Partitions keywords into K groups by minimizing within-cluster variance
  • Best for: When you know how many topic groups you need
  • Parameters: Number of Clusters (must specify upfront)

DBSCAN Clustering

  • How it works: Groups keywords based on density; identifies core keywords and noise
  • Best for: Finding natural groupings without specifying cluster count
  • Parameters: Epsilon (maximum distance for same cluster)
  • Note: Can label some keywords as noise (cluster label -1)

Performance Considerations

  • Processing time increases with:
    • Number of keywords (O(n²) for distance calculation)
    • Complexity of distance method
    • Number of clusters (for K-Means)
  • Recommended limits:
    • Under 1000 keywords for real-time processing
    • Use batching for larger sets
  • Cosine Similarity with TF-IDF is slower but more accurate
  • Normalized Edit Distance is fastest but less semantically aware
  • TF-IDF Analysis - Extract keywords to cluster
  • Frequency Analysis - Extract keywords to cluster
  • Normalize Text - Preprocess keywords before clustering for better results
  • Count Occurrences - Verify keyword usage in clustered groups