Split Document

Splits multi-page documents into individual page image files using ABBYY FineReader Engine.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

Path - Path to the multi-page document or image file to split.
Out Path - Base path for output files (page numbers will be appended automatically).

Options

This node has no configurable options.

Outputs

This node produces multiple output image files, one for each page in the document. Files are named sequentially.

How It Works

The Split Document node separates multi-page files into individual images. When executed, the node:

Validates the input file exists
Creates an ABBYY FRDocument and loads all pages
Iterates through each page in the document
Extracts the image from each page
Saves each page as a separate image file
Appends page numbers to the base filename
Closes handles and cleans up resources

Requirements

Valid ABBYY FineReader Engine installation
Valid ABBYY license
Input file must exist and contain multiple pages
Supported formats: PDF, TIFF, multi-page TIFF
Output directory must exist and be writable

Error Handling

The node will return specific errors in the following cases:

ErrNotFound - Input document file not found. Error message includes the path.

Usage Example

Scenario: Split a multi-page PDF into individual JPG files

Split Document node:
- Path: "C:/documents/contract.pdf"
- Out Path: "C:/pages/contract_page.jpg"

Output files:
- C:/pages/contract_page.jpg (page 1)
- C:/pages/contract_page_2.jpg (page 2)
- C:/pages/contract_page_3.jpg (page 3)
- ...

Scenario: Split a scanned multi-page TIFF

Split Document node:
- Path: "C:/scans/report.tiff"
- Out Path: "C:/individual/page.png"

Output files:
- C:/individual/page.png (page 1)
- C:/individual/page_2.png (page 2)
- C:/individual/page_3.png (page 3)
- ...

Scenario: Process each page separately after splitting

1. Split Document node:
   - Path: "C:/docs/book.pdf"
   - Out Path: "C:/pages/page.jpg"

2. Get list of created files

3. Loop through pages:
   - Process each page individually
   - Apply different processing based on page number
   - Extract specific data from each page

Common Use Cases

Page-by-Page Processing - Process each page with different settings
Selective Page Extraction - Extract and process only certain pages
Parallel Processing - Process multiple pages in parallel
Page Classification - Classify or route pages individually
Multi-Document Splitting - Separate combined scans into individual documents
PDF Conversion - Convert PDF pages to image format
Archive Preparation - Prepare pages for different archive systems
Batch Processing - Process document collections page-by-page

Output File Naming

First Page

Uses the base filename as-is
Example: contract_page.jpg

Subsequent Pages

Appends page number with underscore
Example: contract_page_2.jpg, contract_page_3.jpg
Numbering starts at 2 for the second page

File Extension

Determined by Out Path extension
Supports: JPG, PNG, BMP, TIFF
Use extension matching your needs

Tips and Best Practices

Input Formats:
- Works with multi-page PDF files
- Supports multi-page TIFF images
- Single-page documents produce one output file
- Verify input format is supported
Output Format:
- Use JPG for photographs and general use
- Use PNG for screenshots and graphics
- Use TIFF for archival quality
- Match extension in Out Path
File Naming:
- Choose descriptive base names
- Include document identifier
- Consider sequential numbering in name
- Avoid special characters in paths
Disk Space:
- Each page becomes a separate file
- Image files can be large (especially PNG/TIFF)
- Estimate: ~500KB - 5MB per page
- Ensure sufficient disk space
Performance:
- Splitting is relatively fast
- Time depends on page count and size
- No OCR performed (images only)
- Sequential processing (one page at a time)
Memory Usage:
- Processes one page at a time
- Memory efficient for large documents
- Handles properly closed and released
- Suitable for hundreds of pages
Workflow Integration:
- Split first, then process pages individually
- Enables parallel processing of pages
- Allows selective page processing
- Combine results after individual processing
Page Count:
- No limit on page count
- Works with 1 to 1000+ pages
- Consider processing time for large documents
- Monitor output directory size
Quality:
- Output preserves original image quality
- No compression or quality loss (with PNG/TIFF)
- JPG uses default compression
- Original resolution maintained
Use Cases by Industry:
- Legal: Split contracts and agreements
- Healthcare: Separate medical record pages
- Finance: Split statements and reports
- HR: Separate application packages
- Education: Split scanned textbooks
After Splitting:
- Track which pages came from which document
- Store original document reference
- Consider page reassembly strategy
- Implement cleanup for temporary files
Error Handling:
- Enable Continue On Error for batch jobs
- Verify all pages were created
- Count output files matches expected pages
- Handle incomplete splits
Automation Patterns:
- Split → Classify → Route pages
- Split → Process individually → Combine results
- Split → Extract data → Database insert
- Split → OCR → Index → Archive
Best Practices:
- Clean up split files after processing
- Log page count and file sizes
- Verify split completed successfully
- Implement error recovery
- Consider temporary directory for splits
Limitations:
- Extracts images only (no OCR)
- Sequential processing (not parallel)
- All pages use same output format
- Page numbering is automatic (not customizable)

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

Requirements​

Error Handling​

Usage Example​

Common Use Cases​

Output File Naming​

First Page​

Subsequent Pages​

File Extension​

Tips and Best Practices​