Skip to main content

ABBYY FineReader SDK

ABBYY FineReader SDK package provides desktop-based OCR and document processing capabilities using the ABBYY FineReader Engine. Process documents, extract MRZ data, enhance camera images, classify documents, and more with enterprise-grade accuracy.

Prerequisites

Before using ABBYY FineReader SDK nodes, you need to:

  1. Install ABBYY FineReader Engine on your Windows machine
  2. Obtain a valid ABBYY FineReader Engine license
  3. Configure the license in the config.yaml file
  4. Ensure the FineReader Engine DLLs are accessible

Note: ABBYY FineReader SDK is currently only available for Windows platform.

System Requirements

  • Platform: Windows (64-bit)
  • ABBYY FineReader Engine: Version 12 or higher
  • .NET Framework: 4.7.2 or higher
  • License: Valid ABBYY FineReader Engine license

Available Nodes

Document Processing

  • Process Document - Perform OCR on documents with advanced options and export to various formats
  • Process MRZ - Extract Machine Readable Zone data from passports and ID cards
  • Split Document - Split multi-page documents into individual page image files

Image Enhancement

  • Camera OCR - Process camera photos with image enhancement and preprocessing

Document Classification

  • Classify Document - Classify documents using trained ABBYY classification models
  • Train Model - Train document classification models using labeled training data

Key Features

Advanced OCR

  • Multi-language support (200+ languages)
  • Multiple text types (normal, typewriter, handprinted, etc.)
  • High accuracy recognition with ABBYY's enterprise engine

Image Processing

  • Automatic orientation correction
  • Skew correction
  • Noise removal with multiple models
  • Geometric distortion correction
  • Motion blur removal

Export Formats

  • Documents: DOCX, XLSX, PPTX, RTF
  • PDF: Searchable PDF, PDF with text and images, PDF/A
  • Text: TXT (structured and unstructured)
  • Structured: XML

Document Classification

  • Train custom classification models
  • Classify documents by type
  • Support for image and text-based classification
  • Cross-validation for accuracy assessment

Common Workflow Patterns

Simple Document OCR

1. Process Document
- Input: Scanned document image
- Output: Searchable PDF or DOCX

Camera Image Enhancement

1. Camera OCR
- Input: Photo taken with smartphone
- Enhancement: Deskew, remove blur, correct orientation
- Output: Clean image + OCR statistics

Passport/ID Processing

1. Process MRZ
- Input: Passport or ID card image
- Output: Structured MRZ data (XML/JSON)

Document Classification

1. Train Model
- Input: Labeled training documents
- Output: Classification model file

2. Classify Document
- Input: Document + trained model
- Output: Document category/label

Multi-Page Processing

1. Split Document
- Input: Multi-page PDF or TIFF
- Output: Individual page images

2. Process each page as needed

Comparison: Cloud vs FineReader SDK

FeatureABBYY CloudFineReader SDK
DeploymentCloud-based APIOn-premise desktop
PlatformCross-platformWindows only
InternetRequiredNot required
Cost ModelPay per use (credits)License-based
ProcessingServer-sideLocal machine
PrivacyData sent to cloudData stays local
PerformanceDepends on networkLocal processing speed
SetupMinimal (credentials)Install engine + license
Use CaseScalable cloud workflowsDesktop automation, offline

Configuration

config.yaml

# ABBYY FineReader Engine configuration
# License and engine settings are configured here

Engine Profiles

The package supports multiple processing profiles:

  • DocumentConversion_Accuracy - Best quality, slower processing
  • DocumentConversion_Speed - Faster processing, good quality
  • DocumentArchiving - Optimized for archival
  • TextExtraction - Extract text only

Best Practices

  1. License Management:

    • Ensure valid license before processing
    • Monitor license expiration
    • Handle license errors gracefully
  2. Image Quality:

    • Use high-resolution images (300+ DPI)
    • Ensure good lighting and contrast
    • Preprocess images if needed
  3. Language Selection:

    • Always specify correct language
    • Use language matching document content
    • Supports multiple languages simultaneously
  4. Performance:

    • Local processing is fast for single documents
    • Large documents may require significant time
    • Consider parallel processing for batches
  5. Error Handling:

    • Enable Continue On Error for batch processing
    • Validate input files before processing
    • Check output file generation

Error Handling

Common errors:

  • ErrNotFound - Input file not found or MRZ not detected
  • ErrInvalidArg - Invalid option or parameter
  • Engine initialization errors - License or installation issues

Supported Languages

ABBYY FineReader Engine supports 200+ languages including:

  • All major European languages
  • Chinese (Simplified and Traditional)
  • Japanese, Korean
  • Arabic, Hebrew, Thai, Vietnamese
  • And many more...

Tips for Best Results

Document OCR

  • Use appropriate profile for your use case
  • Enable corrections (orientation, skew) for scanned documents
  • Match text type to document characteristics

Camera Images

  • Enable all enhancement options for smartphone photos
  • Use noise removal for low-light images
  • Correct geometric distortions for angled shots

MRZ Processing

  • Ensure entire MRZ is visible
  • Use high resolution for small text
  • Keep document flat during capture

Classification

  • Provide diverse training samples
  • Use cross-validation to assess accuracy
  • Train with representative documents