Frequency Analysis
Performs frequency analysis on documents to extract and rank keywords based on occurrence counts. This node provides a simpler alternative to TF-IDF analysis, focusing on raw frequency counts to identify commonly used terms across documents.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Documents - An array of JSON objects each containing
nameandtextfields. Example format:[
{"name": "Document 1", "text": "Your document text here"},
{"name": "Document 2", "text": "Another document text"}
]
Options
- Top N - Specifies the number of top terms to retrieve from the frequency analysis. Default: 25.
- Min N-gram - Sets the lower boundary of the range of n-values for different n-gram sizes to be extracted. For instance, min_ngram=1 means the analysis will include unigrams (single words). Default: 1.
- Max N-gram - Sets the upper boundary for the range of n-values for different n-gram sizes. A max_ngram=3 would mean the analysis includes trigrams (phrases of three words) at most. Default: 3.
- Min DF - Minimum document frequency threshold as a percentage. Terms appearing in fewer documents than this percentage will be ignored. A value of 0.01 (1%) means ignoring terms that appear in less than 1% of your documents. Default: 0.01.
- Max DF - Maximum document frequency threshold as a percentage. Terms appearing in more documents than this percentage will be filtered out. A value of 0.5 (50%) filters out terms that appear in more than half of your documents. Default: 0.5.
- Stop Words Language - Language for stop word filtering. Stop words are ignored during the process of vectorizing text, as these words usually carry less meaningful information for analysis. Supported languages: Arabic, Azerbaijani, Basque, Bengali, Catalan, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hinglish, Hungarian, Indonesian, Italian, Kazakh, Nepali, Norwegian, Portuguese, Romanian, Russian, Slovene, Spanish, Swedish, Tajik, Turkish. Default: English.
Output
- Frequency Results - An object containing the analysis results with the following structure:
{
"columns": ["Keyword", "Document 1", "Document 2", "N-Gram", "Document Count", "Total Count", "Total Score"],
"rows": [
{
"Keyword": "keyword phrase",
"Document 1": 5,
"Document 2": 3,
"N-Gram": 2,
"Document Count": 2,
"Total Count": 8,
"Total Score": 32
}
]
}
How It Works
Frequency Analysis extracts keywords based on how often they appear across documents:
- Tokenizes documents into n-grams (single words, two-word phrases, three-word phrases based on your settings)
- Filters stop words based on the selected language
- Counts occurrences of each term in each document
- Applies document frequency filtering to remove very common and very rare terms
- Validates n-grams by ensuring they exist exactly in at least one document
- Computes a Total Score by multiplying N-Gram length, Document Count, and Total Count
- Ranks keywords by Total Score to identify the most frequently used terms
- Returns Top N keywords with their counts across all documents
Frequency Analysis vs TF-IDF Analysis
| Feature | Frequency Analysis | TF-IDF Analysis |
|---|---|---|
| Metric | Raw counts | Weighted importance scores |
| Best For | Finding common terms, basic analysis | Finding distinctive terms, advanced analysis |
| Speed | Faster | Slightly slower |
| Complexity | Simpler to understand | More sophisticated |
| Use Case | General keyword extraction | Competitive analysis, distinguishing documents |
When to use Frequency Analysis:
- Quick keyword extraction from similar documents
- Finding most commonly discussed topics
- Simpler analysis needs
- When you want actual occurrence counts
When to use TF-IDF Analysis:
- Comparing diverse documents
- Finding unique keywords per document
- More sophisticated SEO analysis
- When you need importance scores
Practical Examples
Example 1: Basic Keyword Extraction
Extract most common keywords from blog posts:
// Prepare documents
msg.documents = [
{
name: "Blog Post 1",
text: "Content marketing is essential for digital success. Good content attracts customers."
},
{
name: "Blog Post 2",
text: "Digital marketing strategies include content creation and social media marketing."
},
{
name: "Blog Post 3",
text: "Successful content marketing requires planning and consistent content delivery."
}
];
// Configure Frequency Analysis node
// Top N: 10
// Min N-gram: 1
// Max N-gram: 2
// Stop Words Language: English
// After Frequency Analysis
const results = msg.results.rows;
console.log("Top Keywords by Frequency:");
results.forEach((keyword, index) => {
console.log(`${index + 1}. "${keyword.Keyword}" - Total: ${keyword['Total Count']}, Documents: ${keyword['Document Count']}`);
});
// Extract just the keywords
msg.topKeywords = results.map(r => r.Keyword);
Example 2: Finding Common Themes Across Documents
Identify recurring themes in customer feedback:
// Customer feedback from surveys
msg.documents = msg.surveyResponses.map((response, index) => ({
name: `Response ${index + 1}`,
text: response.feedback
}));
// Configure Frequency Analysis:
// Top N: 15
// Min N-gram: 1
// Max N-gram: 3
// Min DF: 0.1 (appear in at least 10% of responses)
// Max DF: 0.8 (filter very common terms)
// After Frequency Analysis
const keywords = msg.results.rows;
// Group by theme based on Total Score
msg.themes = {
high_priority: [],
medium_priority: [],
low_priority: []
};
keywords.forEach(keyword => {
if (keyword['Total Score'] > 50) {
msg.themes.high_priority.push(keyword.Keyword);
} else if (keyword['Total Score'] > 20) {
msg.themes.medium_priority.push(keyword.Keyword);
} else {
msg.themes.low_priority.push(keyword.Keyword);
}
});
console.log("Customer Feedback Themes:");
console.log("High Priority:", msg.themes.high_priority.join(", "));
console.log("Medium Priority:", msg.themes.medium_priority.join(", "));
Example 3: Content Audit
Analyze existing content to understand keyword focus:
// Collect all published articles
const articles = msg.publishedArticles; // From database or CMS
msg.documents = articles.map(article => ({
name: article.title,
text: article.content
}));
// Configure Frequency Analysis:
// Top N: 30
// Min N-gram: 2
// Max N-gram: 3
// Stop Words Language: English
// After Frequency Analysis
const results = msg.results;
// Create audit report
msg.contentAudit = {
totalArticles: articles.length,
topKeywords: results.rows.slice(0, 10).map(r => ({
keyword: r.Keyword,
totalCount: r['Total Count'],
appearsIn: r['Document Count'],
coverage: (r['Document Count'] / articles.length * 100).toFixed(1) + '%'
})),
recommendations: []
};
// Identify gaps
results.rows.forEach(keyword => {
const coverage = keyword['Document Count'] / articles.length;
if (coverage < 0.3 && keyword['Total Count'] > 5) {
msg.contentAudit.recommendations.push(
`Consider creating more content about "${keyword.Keyword}" (only in ${(coverage * 100).toFixed(0)}% of articles)`
);
}
});
console.log("Content Audit Report:");
console.log(JSON.stringify(msg.contentAudit, null, 2));
Example 4: Competitive Keyword Analysis
Compare keyword usage across competitor websites:
// Scrape competitor pages
const competitors = [
{ name: "Competitor A", pages: msg.competitorAPages },
{ name: "Competitor B", pages: msg.competitorBPages },
{ name: "Competitor C", pages: msg.competitorCPages }
];
msg.competitorAnalysis = [];
for (let competitor of competitors) {
// Prepare documents for this competitor
msg.documents = competitor.pages.map((page, index) => ({
name: `${competitor.name} Page ${index + 1}`,
text: page.content
}));
// Run Frequency Analysis for each competitor
// After Frequency Analysis
msg.competitorAnalysis.push({
name: competitor.name,
topKeywords: msg.results.rows.slice(0, 10),
totalPages: competitor.pages.length
});
}
// Compare competitors
console.log("Competitor Keyword Comparison:");
msg.competitorAnalysis.forEach(comp => {
console.log(`\n${comp.name} (${comp.totalPages} pages):`);
comp.topKeywords.forEach((kw, index) => {
console.log(` ${index + 1}. ${kw.Keyword}: ${kw['Total Count']} times`);
});
});
// Find unique keywords per competitor
msg.uniqueToCompetitor = {};
msg.competitorAnalysis.forEach(comp => {
const allOtherKeywords = msg.competitorAnalysis
.filter(c => c.name !== comp.name)
.flatMap(c => c.topKeywords.map(k => k.Keyword));
const unique = comp.topKeywords
.filter(kw => !allOtherKeywords.includes(kw.Keyword))
.map(kw => kw.Keyword);
if (unique.length > 0) {
msg.uniqueToCompetitor[comp.name] = unique;
}
});
Example 5: Multi-language Content Analysis
Analyze content in different languages:
// Turkish content analysis
msg.documents = [
{
name: "Türkçe Makale 1",
text: "SEO optimizasyonu dijital pazarlama için çok önemlidir. İçerik pazarlama stratejileri başarılı olmalıdır."
},
{
name: "Türkçe Makale 2",
text: "Dijital pazarlama ve içerik üretimi modern işletmeler için gereklidir."
}
];
// Configure Frequency Analysis:
// Stop Words Language: Turkish
// Top N: 15
// Min N-gram: 1
// Max N-gram: 2
// After Frequency Analysis
const turkishKeywords = msg.results.rows;
console.log("Türkçe İçerik Anahtar Kelimeler:");
turkishKeywords.forEach(kw => {
console.log(`${kw.Keyword}: ${kw['Total Count']}`);
});
Example 6: Trend Analysis Over Time
Track keyword frequency changes over time:
// Organize articles by time period
const periods = [
{ name: "Q1 2024", articles: msg.q1Articles },
{ name: "Q2 2024", articles: msg.q2Articles },
{ name: "Q3 2024", articles: msg.q3Articles },
{ name: "Q4 2024", articles: msg.q4Articles }
];
msg.trendAnalysis = [];
const trackKeywords = ["ai", "machine learning", "automation", "cloud computing"];
for (let period of periods) {
msg.documents = period.articles.map((article, index) => ({
name: `Article ${index + 1}`,
text: article.content
}));
// Run Frequency Analysis
// After Frequency Analysis
const results = msg.results.rows;
// Extract frequency for tracked keywords
const periodData = {
period: period.name,
keywords: {}
};
trackKeywords.forEach(keyword => {
const found = results.find(r => r.Keyword.toLowerCase() === keyword.toLowerCase());
periodData.keywords[keyword] = found ? found['Total Count'] : 0;
});
msg.trendAnalysis.push(periodData);
}
// Display trends
console.log("Keyword Trends Over Time:");
trackKeywords.forEach(keyword => {
console.log(`\n${keyword}:`);
msg.trendAnalysis.forEach(period => {
console.log(` ${period.period}: ${period.keywords[keyword]}`);
});
});
Tips for Effective Use
-
Document Preparation
- Clean text data (remove HTML, excessive whitespace)
- Ensure documents are in the same language
- Combine very short documents for better results
-
N-gram Settings
- Use 1-2 for general topics and quick analysis
- Use 1-3 for detailed phrase extraction
- Higher max n-gram for industry-specific terminology
-
Document Frequency Tuning
- Adjust Min DF based on corpus size
- Lower Max DF (0.3-0.4) to filter common terms
- Higher Min DF for large document sets
-
Stop Words
- Always select the correct language
- Stop word filtering significantly improves results
- Consider domain-specific terms that act like stop words
-
Result Interpretation
- Higher Total Score = more important keyword
- Document Count shows keyword reach
- N-Gram value indicates phrase length
Common Errors and Solutions
Issue: Too Many Common Words
Cause: Max DF threshold is too high or stop words not filtering correctly.
Solution:
- Lower Max DF to 0.3 or 0.4
- Verify correct language is selected
- Check that documents are properly cleaned
Issue: No Keywords Found
Cause: Min DF too high for document set size or documents too short.
Solution:
- Lower Min DF value (try 0.01)
- Ensure you have at least 10 documents
- Verify documents contain sufficient text
Issue: Results Show Single Characters or Fragments
Cause: Text may contain special characters or poor tokenization.
Solution:
- Clean text before analysis
- Remove special characters and numbers
- Use Min N-gram = 1, Max N-gram = 2 for better word extraction
Issue: Missing Expected Keywords
Cause: Keywords might be stop words or filtered by frequency thresholds.
Solution:
- Verify keyword isn't a stop word in selected language
- Lower Min DF if keyword appears in few documents
- Increase Max DF if keyword appears in many documents
Performance Considerations
- Faster than TF-IDF Analysis
- Processing time scales with:
- Number of documents
- Document length
- Max N-gram value
- Top N value
- Recommended limits:
- 100+ documents: increase Min DF
- Very long documents: process in batches
- Large Top N values: may impact memory
Related Nodes
- TF-IDF Analysis - More sophisticated keyword importance analysis
- Count Occurrences - Count specific keywords in text
- Cluster Keywords - Group extracted keywords by similarity
- Normalize Text - Preprocess text before frequency analysis
- Gap Analysis - Compare keyword frequencies across texts