Process Document

Processes a document using ABBYY OCR with comprehensive language, profile, and format options for full document conversion.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

Task ID - Task ID from a previously submitted image using Submit Image node. This allows processing of multi-page documents.

Options

Language - Language of the document text (default: English). Supports over 200 languages.
Profile - Processing profile to use (default: documentConversion). Options:
- Document Conversion - Optimized for converting documents to editable formats
- Document Archiving - Optimized for creating searchable archives
- Text Extraction - Optimized for extracting plain text only
- Barcode Recognition - Optimized for recognizing barcodes in documents
Text Type - Type of text in the document (default: normal). Options include Normal, Typewriter, Matrix, Index, OCR-A, OCR-B, E13B, CMC7, Gothic, Handprinted.
Image Source - Source of the image (default: auto). Options:
- Auto - Automatically detect source
- Photo - Image from camera or photo
- Scanner - Image from scanner
Export Format - Output format (default: txt). Supports txt, rtf, docx, xlsx, pptx, pdfSearchable, pdfTextAndImages, pdfa, xml, and more.
Write Tags - Whether to write structural tags in output (default: Auto). Options: Auto, Write, Don't Write.
Description - Optional description for the processing task.
Correct Orientation - Automatically correct page orientation (default: false).
Correct Skew - Automatically correct page skew (default: false).
Read Barcodes - Also read barcodes in the document (default: false).
Write Formatting - Preserve text formatting in output (default: false).
Write Recognition Variants - Include recognition variants for uncertain characters (default: false).
Paragraph as One Line - Treat paragraphs as single lines (default: false).

Outputs

Task - ABBYY task object containing document processing results.

How It Works

The Process Document node performs full document OCR on previously submitted images. When executed, the node:

Retrieves the submitted images using the task ID
Applies the selected processing profile for optimal results
Performs orientation and skew correction if enabled
Recognizes text using the specified language and text type
Optionally reads barcodes within the document
Exports results in the selected format with formatting options
Returns a task object with download URLs and status

Requirements

Valid ABBYY Cloud credentials
Task ID from a previously submitted image (using Submit Image node)
Valid language, profile, and export format selections

Error Handling

The node will return specific errors in the following cases:

Robomotion.ABBYYCloud.ErrTaskID - Task ID is invalid or empty
Robomotion.ABBYYCloud.ErrOption - Invalid option parameters selected
Robomotion.ABBYYCloud.ErrDescription - Invalid description format

Usage Example

Scenario: Convert a multi-page scanned document to searchable PDF

1. Process Image node (first page):
   - Image Path: "C:/scans/page_001.jpg"
   - Language: English
   - Export Format: pdfSearchable
   - Output: task object

2. Submit Image node (additional pages):
   - Image Path: "C:/scans/page_002.jpg"
   - Task ID: {{ $.task.id }}
   - (Repeat for each page)

3. Process Document node:
   - Task ID: {{ $.task.id }}
   - Language: English
   - Profile: Document Conversion
   - Text Type: Normal
   - Image Source: Scanner
   - Export Format: pdfSearchable
   - Correct Orientation: true
   - Correct Skew: true
   - Write Formatting: true

4. Wait Task node:
   - Task: {{ $.task }}
   - Timeout: 120 seconds

Common Use Cases

Document Digitization - Convert paper documents to searchable digital formats
PDF Creation - Create searchable PDFs from scanned images
Multi-Page Processing - Process books, contracts, or reports with multiple pages
Archive Creation - Build searchable document archives from scans
Format Conversion - Convert between different document formats
Text Extraction - Extract plain text from complex documents
Mixed Content - Process documents containing both text and barcodes

Tips and Best Practices

Profile Selection:
- Use "Document Conversion" for editable output (DOCX, XLSX)
- Use "Document Archiving" for long-term storage (PDF/A)
- Use "Text Extraction" when only text content is needed
- Use "Barcode Recognition" for documents with many barcodes
Image Source:
- Set to "Scanner" for scanned documents (better quality)
- Set to "Photo" for camera images (applies special processing)
- Use "Auto" when source is mixed or unknown
Correction Options:
- Enable "Correct Orientation" for mixed page orientations
- Enable "Correct Skew" for crooked scans
- Both can improve accuracy but increase processing time
Format Options:
- Use PDF formats for archival and distribution
- Use DOCX for editing and reformatting
- Use TXT for simple text extraction
- Use XML for programmatic processing
Multi-Page Workflow:
- Start with Process Image for the first page
- Use Submit Image to add remaining pages
- Finish with Process Document to process all pages
- This ensures all pages are included in the final output
Performance:
- Larger documents take longer to process
- Adjust Wait Task timeout based on page count
- Consider processing in batches for very large documents

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

Requirements​

Error Handling​

Usage Example​

Common Use Cases​

Tips and Best Practices​