ABBYY FineReader SDK
ABBYY FineReader SDK package provides desktop-based OCR and document processing capabilities using the ABBYY FineReader Engine. Process documents, extract MRZ data, enhance camera images, classify documents, and more with enterprise-grade accuracy.
Prerequisites
Before using ABBYY FineReader SDK nodes, you need to:
- Install ABBYY FineReader Engine on your Windows machine
- Obtain a valid ABBYY FineReader Engine license
- Configure the license in the
config.yamlfile - Ensure the FineReader Engine DLLs are accessible
Note: ABBYY FineReader SDK is currently only available for Windows platform.
System Requirements
- Platform: Windows (64-bit)
- ABBYY FineReader Engine: Version 12 or higher
- .NET Framework: 4.7.2 or higher
- License: Valid ABBYY FineReader Engine license
Available Nodes
Document Processing
- Process Document - Perform OCR on documents with advanced options and export to various formats
- Process MRZ - Extract Machine Readable Zone data from passports and ID cards
- Split Document - Split multi-page documents into individual page image files
Image Enhancement
- Camera OCR - Process camera photos with image enhancement and preprocessing
Document Classification
- Classify Document - Classify documents using trained ABBYY classification models
- Train Model - Train document classification models using labeled training data
Key Features
Advanced OCR
- Multi-language support (200+ languages)
- Multiple text types (normal, typewriter, handprinted, etc.)
- High accuracy recognition with ABBYY's enterprise engine
Image Processing
- Automatic orientation correction
- Skew correction
- Noise removal with multiple models
- Geometric distortion correction
- Motion blur removal
Export Formats
- Documents: DOCX, XLSX, PPTX, RTF
- PDF: Searchable PDF, PDF with text and images, PDF/A
- Text: TXT (structured and unstructured)
- Structured: XML
Document Classification
- Train custom classification models
- Classify documents by type
- Support for image and text-based classification
- Cross-validation for accuracy assessment
Common Workflow Patterns
Simple Document OCR
1. Process Document
- Input: Scanned document image
- Output: Searchable PDF or DOCX
Camera Image Enhancement
1. Camera OCR
- Input: Photo taken with smartphone
- Enhancement: Deskew, remove blur, correct orientation
- Output: Clean image + OCR statistics
Passport/ID Processing
1. Process MRZ
- Input: Passport or ID card image
- Output: Structured MRZ data (XML/JSON)
Document Classification
1. Train Model
- Input: Labeled training documents
- Output: Classification model file
2. Classify Document
- Input: Document + trained model
- Output: Document category/label
Multi-Page Processing
1. Split Document
- Input: Multi-page PDF or TIFF
- Output: Individual page images
2. Process each page as needed
Comparison: Cloud vs FineReader SDK
| Feature | ABBYY Cloud | FineReader SDK |
|---|---|---|
| Deployment | Cloud-based API | On-premise desktop |
| Platform | Cross-platform | Windows only |
| Internet | Required | Not required |
| Cost Model | Pay per use (credits) | License-based |
| Processing | Server-side | Local machine |
| Privacy | Data sent to cloud | Data stays local |
| Performance | Depends on network | Local processing speed |
| Setup | Minimal (credentials) | Install engine + license |
| Use Case | Scalable cloud workflows | Desktop automation, offline |
Configuration
config.yaml
# ABBYY FineReader Engine configuration
# License and engine settings are configured here
Engine Profiles
The package supports multiple processing profiles:
- DocumentConversion_Accuracy - Best quality, slower processing
- DocumentConversion_Speed - Faster processing, good quality
- DocumentArchiving - Optimized for archival
- TextExtraction - Extract text only
Best Practices
-
License Management:
- Ensure valid license before processing
- Monitor license expiration
- Handle license errors gracefully
-
Image Quality:
- Use high-resolution images (300+ DPI)
- Ensure good lighting and contrast
- Preprocess images if needed
-
Language Selection:
- Always specify correct language
- Use language matching document content
- Supports multiple languages simultaneously
-
Performance:
- Local processing is fast for single documents
- Large documents may require significant time
- Consider parallel processing for batches
-
Error Handling:
- Enable Continue On Error for batch processing
- Validate input files before processing
- Check output file generation
Error Handling
Common errors:
ErrNotFound- Input file not found or MRZ not detectedErrInvalidArg- Invalid option or parameter- Engine initialization errors - License or installation issues
Supported Languages
ABBYY FineReader Engine supports 200+ languages including:
- All major European languages
- Chinese (Simplified and Traditional)
- Japanese, Korean
- Arabic, Hebrew, Thai, Vietnamese
- And many more...
Tips for Best Results
Document OCR
- Use appropriate profile for your use case
- Enable corrections (orientation, skew) for scanned documents
- Match text type to document characteristics
Camera Images
- Enable all enhancement options for smartphone photos
- Use noise removal for low-light images
- Correct geometric distortions for angled shots
MRZ Processing
- Ensure entire MRZ is visible
- Use high resolution for small text
- Keep document flat during capture
Classification
- Provide diverse training samples
- Use cross-validation to assess accuracy
- Train with representative documents
📄️ Camera OCR
Robomotion.FineReader.CameraOCR
📄️ Classify Document
Robomotion.FineReader.Classify
📄️ Process Document
Robomotion.FineReader.ProcessDocument
📄️ Process MRZ
Robomotion.FineReader.ProcessMRZ
📄️ Split Document
Robomotion.FineReader.SplitDocument
📄️ Train Model
Robomotion.FineReader.Train