Skip to main content

Process Text

Processes text fields from an image with advanced OCR settings for text type, marking, and writing style.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

  • Image Path - Path to the image file containing text to be recognized.

Options

  • Text Type - Type of text in the image (default: normal). Options:
    • Normal - Regular printed text
    • Typewriter - Typewriter or monospaced fonts
    • Matrix - Dot matrix printer text
    • Index - Index or subscript text
    • OCR-A - OCR-A font
    • OCR-B - OCR-B font
    • E13B - MICR E13B font (banking)
    • CMC7 - MICR CMC7 font (banking)
    • Gothic - Gothic or blackletter fonts
    • Handprinted - Hand-printed text
  • Language - Language of the text (default: English). Supports over 200 languages.
  • Marking Type - Type of text marking/formatting (default: simpleText). Options:
    • Simple Text - Unmarked text
    • Underlined Text - Underlined characters
    • Text In Frame - Text within a border
    • Grey Boxes - Text on grey background
    • Char Box Series - Each character in a box
    • Simple Comb - Comb-like structure
    • Comb In Frame - Comb structure with border
    • Partitioned Frame - Divided frame structure
  • Writing Style - Regional writing style for recognition (default: default). Options include American, German, Russian, Polish, Thai, Japanese, Arabic, and many more regional styles.
  • Placeholders Count - Number of placeholders in the text field (default: 1).
  • Region - Optional region coordinates to limit recognition area (format: "left,top,right,bottom").
  • Letter Set - Optional allowed character set to restrict recognition.
  • Regular Expression - Optional regex pattern that the recognized text should match.
  • Description - Optional task description for reference.
  • PDF Password - Password for encrypted PDF files.
  • One Text Line - Treat the entire field as a single line of text (default: false).
  • One Word per Text Line - Expect one word per line (default: false).

Outputs

  • Task - ABBYY task object containing text recognition results.

How It Works

The Process Text node recognizes text fields with advanced settings for specialized text types and formats. When executed, the node:

  1. Reads the image file from the specified path
  2. Applies text type, marking type, and writing style settings
  3. Optionally restricts recognition to a specific region
  4. Applies character set or regex constraints if specified
  5. Uploads the image and settings to ABBYY Cloud
  6. Returns a task object with recognition results

Requirements

  • Valid ABBYY Cloud credentials
  • Valid image file at the specified path
  • Correct text type and marking type selections for your use case
  • Optional: region coordinates, letter set, or regex pattern

Error Handling

The node will return specific errors in the following cases:

  • Robomotion.ABBYYCloud.ErrImagePath - Image path is invalid or file not found
  • Robomotion.ABBYYCloud.ErrImageData - Cannot read image file
  • Robomotion.ABBYYCloud.ErrOption - Invalid option parameters
  • Robomotion.ABBYYCloud.ErrRegion - Invalid region format
  • Robomotion.ABBYYCloud.ErrLetterSet - Invalid letter set
  • Robomotion.ABBYYCloud.ErrRegExp - Invalid regular expression
  • Robomotion.ABBYYCloud.ErrDescription - Invalid description

Usage Example

Scenario: Recognize a handwritten form field with specific constraints

1. Process Text node:
- Image Path: "C:/forms/application_001.jpg"
- Text Type: Handprinted
- Language: English
- Marking Type: Char Box Series
- Letter Set: "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
- Region: "100,200,400,250"
- One Text Line: true

2. Wait Task node:
- Task: {{ $.task }}
- Timeout: 60 seconds

Common Use Cases

  • Form Field Recognition - Extract text from specific form fields with constraints
  • Handwritten Forms - Recognize hand-printed text in form boxes
  • Comb Fields - Extract text from comb-style input fields (passports, IDs)
  • Banking Documents - Recognize MICR fonts (E13B, CMC7) on checks
  • Restricted Input - Recognize text with known character sets or patterns
  • Multi-Style Documents - Handle different text styles in the same document
  • Region-Specific OCR - Extract text from specific areas of an image

Tips and Best Practices

  • Text Type Selection: Choose the correct text type for best accuracy
    • Use "Handprinted" for hand-filled forms
    • Use "Matrix" for old dot matrix printouts
    • Use "E13B" or "CMC7" for bank checks
  • Marking Type: Match the marking type to your form design
    • Use "Char Box Series" for individual character boxes
    • Use "Comb In Frame" for passport-style comb fields
  • Letter Set: Restrict to known characters for better accuracy
    • Numbers only: "0123456789"
    • Uppercase only: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    • Alphanumeric: Combine letters and numbers
  • Regular Expression: Validate format during recognition
    • Phone: "\d3-\d3-\d4"
    • Date: "\d2/\d2/\d4"
    • ID: "[A-Z]2\d6"
  • Region Coordinates: Use image editing tools to determine exact coordinates
  • Writing Style: Select regional style for locale-specific formatting
  • One Text Line: Enable for single-line fields to prevent line breaks