Process Document

Performs OCR on documents and exports to various formats (PDF, DOCX, TXT, etc.) using ABBYY FineReader Engine.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

Path - Path to the input document or image file to process.
Out Path - Path where the processed document will be exported.

Options

Language - Language for OCR text recognition (default: English). Supports 200+ languages including all major European languages, Chinese, Japanese, Korean, Arabic, and many more.
Profile - Processing profile that defines quality and speed tradeoffs (default: DocumentConversion_Accuracy). Options:
- DocumentConversion_Accuracy - Highest quality, best for editable documents
- DocumentConversion_Speed - Faster processing with good quality
- DocumentArchiving_Accuracy - Optimized for searchable archives
- DocumentArchiving_Speed - Fast archival processing
- TextExtraction_Accuracy - Best for text extraction only
- TextExtraction_Speed - Fast text extraction
Text Type - Type of text in the document (default: Normal). Options:
- Normal - Regular printed text
- Typewriter - Typewriter or monospaced fonts
- Matrix - Dot matrix printer text
- Handprinted - Hand-printed text
- Gothic - Gothic or blackletter fonts
- OCR-A - OCR-A font
Export Format - Output file format (default: Searchable PDF). Options:
- Text Document - Plain TXT file
- Rich Text Format - RTF with formatting
- MS Word Document - DOCX format
- MS Excel Document - XLSX spreadsheet
- MS PowerPoint Document - PPTX presentation
- Searchable PDF - PDF with searchable text layer
- PDF with Text and Images - Standard PDF format
- PDF/A - Archival PDF format
- XML - Structured XML output
Correct Skew - Whether to automatically correct skewed text lines (default: Auto). Options:
- False - No correction
- True - Always correct
- Auto - Automatically decide
Correct Orientation - Whether to automatically detect and correct page orientation (default: false).

Outputs

This node produces an output file at the specified Out Path location.

How It Works

The Process Document node performs comprehensive OCR on documents. When executed, the node:

Validates the input file exists
Loads the predefined processing profile
Creates an ABBYY FRDocument and adds the input image
Preprocesses the image (orientation and skew correction)
Recognizes text using the specified language and text type
Exports the result in the selected format
Saves the output to the specified path

Requirements

Valid ABBYY FineReader Engine installation
Valid ABBYY license
Input file must exist and be readable
Supported input formats: JPG, PNG, BMP, TIFF, PDF
Output directory must exist and be writable

Error Handling

The node will return specific errors in the following cases:

ErrNotFound - Input document file not found. Error message includes the file path to help locate the issue.

Usage Example

Scenario: Convert a scanned invoice to searchable PDF

Process Document node:
- Path: "C:/invoices/invoice_2024_001.jpg"
- Out Path: "C:/processed/invoice_2024_001.pdf"
- Language: English
- Profile: DocumentConversion_Accuracy
- Text Type: Normal
- Export Format: Searchable PDF
- Correct Skew: Auto
- Correct Orientation: true

Scenario: Extract text from typewriter document

Process Document node:
- Path: "C:/old_docs/contract_1985.tiff"
- Out Path: "C:/text/contract_1985.txt"
- Language: English
- Profile: TextExtraction_Accuracy
- Text Type: Typewriter
- Export Format: Text Document
- Correct Skew: True
- Correct Orientation: false

Scenario: Create editable Word document from scan

Process Document node:
- Path: "C:/scans/report.pdf"
- Out Path: "C:/editable/report.docx"
- Language: English
- Profile: DocumentConversion_Accuracy
- Text Type: Normal
- Export Format: MS Word Document
- Correct Skew: Auto
- Correct Orientation: true

Common Use Cases

Document Digitization - Convert paper documents to searchable digital formats
Archive Creation - Build searchable document archives in PDF/A format
Text Extraction - Extract plain text from scanned documents
Format Conversion - Convert between different document formats
OCR for Editing - Create editable documents (DOCX) from scans
Data Extraction - Extract text for further processing
Legal Documents - Process contracts and legal papers with high accuracy
Historical Documents - OCR old typewriter or printed documents

Tips and Best Practices

Profile Selection:
- Use "Accuracy" profiles for important documents
- Use "Speed" profiles for large batches with time constraints
- "DocumentConversion" for editable output
- "TextExtraction" when only text is needed
- "DocumentArchiving" for long-term storage
Language Selection:
- Always specify the correct document language
- Supports multiple languages in one document
- Wrong language severely impacts accuracy
- Use language codes for non-English documents
Text Type:
- Match text type to document characteristics
- "Normal" works for most modern documents
- "Typewriter" for old typed documents
- "Matrix" for dot matrix printouts
- "Handprinted" for hand-filled forms
Export Format:
- Use PDF for distribution and archival
- Use DOCX for editing and reformatting
- Use TXT for simple text extraction
- Use XLSX for tabular data
- Use XML for programmatic processing
Image Preprocessing:
- Enable "Correct Orientation" for mixed orientations
- Use "Auto" for skew correction (works well)
- Enable both for unknown document conditions
- Corrections add minimal processing time
Input Quality:
- Use 300 DPI or higher for scanning
- Ensure good lighting and contrast
- Clean documents before scanning
- Avoid shadows and glare
Performance:
- Processing time depends on document size and complexity
- Accuracy profiles take longer than speed profiles
- Multi-page documents process sequentially
- Consider batch processing overnight for large volumes
Output Path:
- Ensure output directory exists
- Use appropriate file extension for format
- Overwriting existing files is supported
- Check disk space for large documents
Error Handling:
- Enable Continue On Error for batch processing
- Validate input files before processing
- Check output file was created successfully
- Log errors for troubleshooting
Quality Validation:
- Spot-check OCR results for accuracy
- Compare with original for critical documents
- Review uncertain characters
- Consider manual review for important documents

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

Requirements​

Error Handling​

Usage Example​

Common Use Cases​

Tips and Best Practices​