Classify Document

Classifies documents using a trained ABBYY classification model.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

Model Path - Path to the trained classification model file (.mdc file created by Train Model node).
File Path - Path to the document file to classify.

Options

Correct Orientation - Whether to automatically detect and correct document orientation before classification (default: false).

Outputs

Label - Classification result label assigned to the document based on the trained model.

How It Works

The Classify Document node categorizes documents using machine learning. When executed, the node:

Loads the trained classification model from file
Determines if text recognition is needed (based on model type)
Creates an ABBYY FRDocument and loads the input file
Optionally preprocesses to correct orientation
Analyzes and recognizes text if required by the model
Classifies the document using the loaded model
Returns the category label with highest confidence

Requirements

Valid ABBYY FineReader Engine installation with Classification Engine
Valid ABBYY license
Trained classification model file (.mdc)
Input document file must exist
Model must be trained on similar document types

Error Handling

The node will return errors if:

Model file not found or invalid
Input document file not found
Classification engine initialization fails

Usage Example

Scenario: Classify incoming invoices by vendor

1. Train Model (one-time setup):
   - Create model for vendors: VendorA, VendorB, VendorC

2. Classify Document node:
   - Model Path: "C:/models/vendor_classifier.mdc"
   - File Path: "C:/inbox/invoice_001.pdf"
   - Correct Orientation: true
   - Output: {{ $.label }} (e.g., "VendorA")

3. Route document based on label

Scenario: Classify documents by type

Classify Document node:
- Model Path: "C:/models/document_types.mdc"
- File Path: "C:/documents/unknown_doc.pdf"
- Correct Orientation: true
- Output: {{ $.label }} (e.g., "Invoice", "Contract", "Receipt")

Scenario: Automated document sorting

Loop through documents:
1. Classify Document:
   - Model Path: "C:/models/sorter.mdc"
   - File Path: {{ $.current_doc }}

2. Decision based on {{ $.label }}:
   - "Invoice" → Move to invoices folder
   - "Contract" → Move to contracts folder
   - "Receipt" → Move to receipts folder
   - "Not classified" → Move to review folder

Common Use Cases

Document Routing - Automatically route documents to appropriate departments
Invoice Processing - Classify invoices by vendor or type
Email Attachment Sorting - Classify and organize email attachments
Archive Organization - Categorize documents for archival
Quality Control - Identify document types for validation
Compliance Checking - Classify documents by regulatory category
Automated Workflows - Trigger different workflows based on document type
Content Management - Auto-tag documents in content management systems

Classification Types

Image-Based Classification

Uses visual features of the document
Fast classification without OCR
Good for documents with consistent layouts
Examples: Forms, templates, letterheads

Text-Based Classification

Uses OCR to extract and analyze text content
More accurate for text-heavy documents
Slower due to OCR requirement
Examples: Contracts, reports, letters

Combined Classification

Uses both image and text features
Best accuracy for mixed documents
Balances speed and accuracy
Recommended for general use

Classification Results

The output label represents:

Category Name - The label from training data
Highest Confidence - Category with highest match score
"Not classified" - Document doesn't match any trained category well

Tips and Best Practices

Model Selection:
- Use model trained on similar document types
- Retrain model if classification accuracy is poor
- Models are document-specific (don't reuse across different domains)
- Keep models updated with new document variations
Orientation Correction:
- Enable if documents may be rotated
- Adds minimal processing time
- Critical for scanned document batches
- Not needed for digital documents
Model Training:
- Train with representative samples
- Include variations of each category
- More training samples = better accuracy
- Test model before production use
Classification Accuracy:
- Depends on model quality
- Depends on document similarity to training
- Check confidence scores if available
- Implement manual review for low confidence
Performance:
- Image-based: Very fast (< 1 second)
- Text-based: Slower (OCR required, 3-5 seconds)
- Combined: Medium speed (2-4 seconds)
- Processing time depends on model type
Error Handling:
- Enable Continue On Error for batch processing
- Handle "Not classified" results
- Log classification results for monitoring
- Flag low-confidence classifications for review
Workflow Integration:
- Classify before detailed processing
- Use label to route to appropriate workflow
- Combine with other nodes based on type
- Track classification accuracy over time
Batch Processing:
- Classify multiple documents in loop
- Track success rates by category
- Identify documents needing manual review
- Update model based on misclassifications
Quality Assurance:
- Spot-check classifications regularly
- Monitor "Not classified" frequency
- Review misclassified documents
- Retrain model when accuracy drops
- Keep training data updated
Common Issues:
- Poor classification → Retrain model with more samples
- "Not classified" → Document type not in training
- Wrong category → Add similar documents to training
- Slow performance → Consider image-based model
Best Practices:
- Start with small category set (3-5 types)
- Expand categories gradually
- Test thoroughly before production
- Monitor and log all classifications
- Maintain model version control
- Document category definitions clearly
- Implement feedback loop for improvements
Document Preparation:
- Consistent image quality helps
- Standard resolution (300 DPI)
- Clean scans without artifacts
- Correct orientation before classification
Model Management:
- Store models in version control
- Document model training parameters
- Track model performance metrics
- Plan for model updates
- Test new models before deployment
Advanced Usage:
- Combine multiple classifiers (hierarchical)
- Use confidence thresholds for routing
- Implement multi-label classification
- Track classification statistics
- Build classification dashboards

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

Requirements​

Error Handling​

Usage Example​

Common Use Cases​

Classification Types​

Image-Based Classification​

Text-Based Classification​

Combined Classification​

Classification Results​

Tips and Best Practices​