Skip to main content

Frequency Analysis

Performs frequency analysis on documents to extract and rank keywords based on occurrence counts. This node provides a simpler alternative to TF-IDF analysis, focusing on raw frequency counts to identify commonly used terms across documents.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Documents - An array of JSON objects each containing name and text fields. Example format:
    [
    {"name": "Document 1", "text": "Your document text here"},
    {"name": "Document 2", "text": "Another document text"}
    ]

Options

  • Top N - Specifies the number of top terms to retrieve from the frequency analysis. Default: 25.
  • Min N-gram - Sets the lower boundary of the range of n-values for different n-gram sizes to be extracted. For instance, min_ngram=1 means the analysis will include unigrams (single words). Default: 1.
  • Max N-gram - Sets the upper boundary for the range of n-values for different n-gram sizes. A max_ngram=3 would mean the analysis includes trigrams (phrases of three words) at most. Default: 3.
  • Min DF - Minimum document frequency threshold as a percentage. Terms appearing in fewer documents than this percentage will be ignored. A value of 0.01 (1%) means ignoring terms that appear in less than 1% of your documents. Default: 0.01.
  • Max DF - Maximum document frequency threshold as a percentage. Terms appearing in more documents than this percentage will be filtered out. A value of 0.5 (50%) filters out terms that appear in more than half of your documents. Default: 0.5.
  • Stop Words Language - Language for stop word filtering. Stop words are ignored during the process of vectorizing text, as these words usually carry less meaningful information for analysis. Supported languages: Arabic, Azerbaijani, Basque, Bengali, Catalan, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hinglish, Hungarian, Indonesian, Italian, Kazakh, Nepali, Norwegian, Portuguese, Romanian, Russian, Slovene, Spanish, Swedish, Tajik, Turkish. Default: English.

Output

  • Frequency Results - An object containing the analysis results with the following structure:
    {
    "columns": ["Keyword", "Document 1", "Document 2", "N-Gram", "Document Count", "Total Count", "Total Score"],
    "rows": [
    {
    "Keyword": "keyword phrase",
    "Document 1": 5,
    "Document 2": 3,
    "N-Gram": 2,
    "Document Count": 2,
    "Total Count": 8,
    "Total Score": 32
    }
    ]
    }

How It Works

Frequency Analysis extracts keywords based on how often they appear across documents:

  1. Tokenizes documents into n-grams (single words, two-word phrases, three-word phrases based on your settings)
  2. Filters stop words based on the selected language
  3. Counts occurrences of each term in each document
  4. Applies document frequency filtering to remove very common and very rare terms
  5. Validates n-grams by ensuring they exist exactly in at least one document
  6. Computes a Total Score by multiplying N-Gram length, Document Count, and Total Count
  7. Ranks keywords by Total Score to identify the most frequently used terms
  8. Returns Top N keywords with their counts across all documents

Frequency Analysis vs TF-IDF Analysis

FeatureFrequency AnalysisTF-IDF Analysis
MetricRaw countsWeighted importance scores
Best ForFinding common terms, basic analysisFinding distinctive terms, advanced analysis
SpeedFasterSlightly slower
ComplexitySimpler to understandMore sophisticated
Use CaseGeneral keyword extractionCompetitive analysis, distinguishing documents

When to use Frequency Analysis:

  • Quick keyword extraction from similar documents
  • Finding most commonly discussed topics
  • Simpler analysis needs
  • When you want actual occurrence counts

When to use TF-IDF Analysis:

  • Comparing diverse documents
  • Finding unique keywords per document
  • More sophisticated SEO analysis
  • When you need importance scores

Practical Examples

Example 1: Basic Keyword Extraction

Extract most common keywords from blog posts:

// Prepare documents
msg.documents = [
{
name: "Blog Post 1",
text: "Content marketing is essential for digital success. Good content attracts customers."
},
{
name: "Blog Post 2",
text: "Digital marketing strategies include content creation and social media marketing."
},
{
name: "Blog Post 3",
text: "Successful content marketing requires planning and consistent content delivery."
}
];

// Configure Frequency Analysis node
// Top N: 10
// Min N-gram: 1
// Max N-gram: 2
// Stop Words Language: English

// After Frequency Analysis
const results = msg.results.rows;

console.log("Top Keywords by Frequency:");
results.forEach((keyword, index) => {
console.log(`${index + 1}. "${keyword.Keyword}" - Total: ${keyword['Total Count']}, Documents: ${keyword['Document Count']}`);
});

// Extract just the keywords
msg.topKeywords = results.map(r => r.Keyword);

Example 2: Finding Common Themes Across Documents

Identify recurring themes in customer feedback:

// Customer feedback from surveys
msg.documents = msg.surveyResponses.map((response, index) => ({
name: `Response ${index + 1}`,
text: response.feedback
}));

// Configure Frequency Analysis:
// Top N: 15
// Min N-gram: 1
// Max N-gram: 3
// Min DF: 0.1 (appear in at least 10% of responses)
// Max DF: 0.8 (filter very common terms)

// After Frequency Analysis
const keywords = msg.results.rows;

// Group by theme based on Total Score
msg.themes = {
high_priority: [],
medium_priority: [],
low_priority: []
};

keywords.forEach(keyword => {
if (keyword['Total Score'] > 50) {
msg.themes.high_priority.push(keyword.Keyword);
} else if (keyword['Total Score'] > 20) {
msg.themes.medium_priority.push(keyword.Keyword);
} else {
msg.themes.low_priority.push(keyword.Keyword);
}
});

console.log("Customer Feedback Themes:");
console.log("High Priority:", msg.themes.high_priority.join(", "));
console.log("Medium Priority:", msg.themes.medium_priority.join(", "));

Example 3: Content Audit

Analyze existing content to understand keyword focus:

// Collect all published articles
const articles = msg.publishedArticles; // From database or CMS

msg.documents = articles.map(article => ({
name: article.title,
text: article.content
}));

// Configure Frequency Analysis:
// Top N: 30
// Min N-gram: 2
// Max N-gram: 3
// Stop Words Language: English

// After Frequency Analysis
const results = msg.results;

// Create audit report
msg.contentAudit = {
totalArticles: articles.length,
topKeywords: results.rows.slice(0, 10).map(r => ({
keyword: r.Keyword,
totalCount: r['Total Count'],
appearsIn: r['Document Count'],
coverage: (r['Document Count'] / articles.length * 100).toFixed(1) + '%'
})),
recommendations: []
};

// Identify gaps
results.rows.forEach(keyword => {
const coverage = keyword['Document Count'] / articles.length;
if (coverage < 0.3 && keyword['Total Count'] > 5) {
msg.contentAudit.recommendations.push(
`Consider creating more content about "${keyword.Keyword}" (only in ${(coverage * 100).toFixed(0)}% of articles)`
);
}
});

console.log("Content Audit Report:");
console.log(JSON.stringify(msg.contentAudit, null, 2));

Example 4: Competitive Keyword Analysis

Compare keyword usage across competitor websites:

// Scrape competitor pages
const competitors = [
{ name: "Competitor A", pages: msg.competitorAPages },
{ name: "Competitor B", pages: msg.competitorBPages },
{ name: "Competitor C", pages: msg.competitorCPages }
];

msg.competitorAnalysis = [];

for (let competitor of competitors) {
// Prepare documents for this competitor
msg.documents = competitor.pages.map((page, index) => ({
name: `${competitor.name} Page ${index + 1}`,
text: page.content
}));

// Run Frequency Analysis for each competitor
// After Frequency Analysis
msg.competitorAnalysis.push({
name: competitor.name,
topKeywords: msg.results.rows.slice(0, 10),
totalPages: competitor.pages.length
});
}

// Compare competitors
console.log("Competitor Keyword Comparison:");
msg.competitorAnalysis.forEach(comp => {
console.log(`\n${comp.name} (${comp.totalPages} pages):`);
comp.topKeywords.forEach((kw, index) => {
console.log(` ${index + 1}. ${kw.Keyword}: ${kw['Total Count']} times`);
});
});

// Find unique keywords per competitor
msg.uniqueToCompetitor = {};
msg.competitorAnalysis.forEach(comp => {
const allOtherKeywords = msg.competitorAnalysis
.filter(c => c.name !== comp.name)
.flatMap(c => c.topKeywords.map(k => k.Keyword));

const unique = comp.topKeywords
.filter(kw => !allOtherKeywords.includes(kw.Keyword))
.map(kw => kw.Keyword);

if (unique.length > 0) {
msg.uniqueToCompetitor[comp.name] = unique;
}
});

Example 5: Multi-language Content Analysis

Analyze content in different languages:

// Turkish content analysis
msg.documents = [
{
name: "Türkçe Makale 1",
text: "SEO optimizasyonu dijital pazarlama için çok önemlidir. İçerik pazarlama stratejileri başarılı olmalıdır."
},
{
name: "Türkçe Makale 2",
text: "Dijital pazarlama ve içerik üretimi modern işletmeler için gereklidir."
}
];

// Configure Frequency Analysis:
// Stop Words Language: Turkish
// Top N: 15
// Min N-gram: 1
// Max N-gram: 2

// After Frequency Analysis
const turkishKeywords = msg.results.rows;

console.log("Türkçe İçerik Anahtar Kelimeler:");
turkishKeywords.forEach(kw => {
console.log(`${kw.Keyword}: ${kw['Total Count']}`);
});

Example 6: Trend Analysis Over Time

Track keyword frequency changes over time:

// Organize articles by time period
const periods = [
{ name: "Q1 2024", articles: msg.q1Articles },
{ name: "Q2 2024", articles: msg.q2Articles },
{ name: "Q3 2024", articles: msg.q3Articles },
{ name: "Q4 2024", articles: msg.q4Articles }
];

msg.trendAnalysis = [];
const trackKeywords = ["ai", "machine learning", "automation", "cloud computing"];

for (let period of periods) {
msg.documents = period.articles.map((article, index) => ({
name: `Article ${index + 1}`,
text: article.content
}));

// Run Frequency Analysis
// After Frequency Analysis
const results = msg.results.rows;

// Extract frequency for tracked keywords
const periodData = {
period: period.name,
keywords: {}
};

trackKeywords.forEach(keyword => {
const found = results.find(r => r.Keyword.toLowerCase() === keyword.toLowerCase());
periodData.keywords[keyword] = found ? found['Total Count'] : 0;
});

msg.trendAnalysis.push(periodData);
}

// Display trends
console.log("Keyword Trends Over Time:");
trackKeywords.forEach(keyword => {
console.log(`\n${keyword}:`);
msg.trendAnalysis.forEach(period => {
console.log(` ${period.period}: ${period.keywords[keyword]}`);
});
});

Tips for Effective Use

  1. Document Preparation

    • Clean text data (remove HTML, excessive whitespace)
    • Ensure documents are in the same language
    • Combine very short documents for better results
  2. N-gram Settings

    • Use 1-2 for general topics and quick analysis
    • Use 1-3 for detailed phrase extraction
    • Higher max n-gram for industry-specific terminology
  3. Document Frequency Tuning

    • Adjust Min DF based on corpus size
    • Lower Max DF (0.3-0.4) to filter common terms
    • Higher Min DF for large document sets
  4. Stop Words

    • Always select the correct language
    • Stop word filtering significantly improves results
    • Consider domain-specific terms that act like stop words
  5. Result Interpretation

    • Higher Total Score = more important keyword
    • Document Count shows keyword reach
    • N-Gram value indicates phrase length

Common Errors and Solutions

Issue: Too Many Common Words

Cause: Max DF threshold is too high or stop words not filtering correctly.

Solution:

  • Lower Max DF to 0.3 or 0.4
  • Verify correct language is selected
  • Check that documents are properly cleaned

Issue: No Keywords Found

Cause: Min DF too high for document set size or documents too short.

Solution:

  • Lower Min DF value (try 0.01)
  • Ensure you have at least 10 documents
  • Verify documents contain sufficient text

Issue: Results Show Single Characters or Fragments

Cause: Text may contain special characters or poor tokenization.

Solution:

  • Clean text before analysis
  • Remove special characters and numbers
  • Use Min N-gram = 1, Max N-gram = 2 for better word extraction

Issue: Missing Expected Keywords

Cause: Keywords might be stop words or filtered by frequency thresholds.

Solution:

  • Verify keyword isn't a stop word in selected language
  • Lower Min DF if keyword appears in few documents
  • Increase Max DF if keyword appears in many documents

Performance Considerations

  • Faster than TF-IDF Analysis
  • Processing time scales with:
    • Number of documents
    • Document length
    • Max N-gram value
    • Top N value
  • Recommended limits:
    • 100+ documents: increase Min DF
    • Very long documents: process in batches
    • Large Top N values: may impact memory
  • TF-IDF Analysis - More sophisticated keyword importance analysis
  • Count Occurrences - Count specific keywords in text
  • Cluster Keywords - Group extracted keywords by similarity
  • Normalize Text - Preprocess text before frequency analysis
  • Gap Analysis - Compare keyword frequencies across texts