PDFBox
Manipulate PDF documents - extract text and images, merge, split, encrypt, and convert PDFs.
Overview
The PDFBox package provides comprehensive PDF manipulation capabilities using Apache PDFBox. Use it when you need to extract content from PDFs, merge or split documents, add security, or convert between PDF and other formats.
Key Features
- Text Extraction - Extract text content from PDFs
- Image Extraction - Extract embedded images from PDFs
- Merge/Split - Combine or separate PDF documents
- Encryption - Add or remove PDF password protection
- Conversion - Convert PDF to images or text to PDF
Available Nodes
- Extract Text - Extract all text content from a PDF
- Extract Images - Extract all images from a PDF
- Merge - Combine multiple PDFs into one
- Split - Split a PDF into separate pages
- Encrypt - Add password protection to a PDF
- Decrypt - Remove password protection from a PDF
- PDF To Image - Convert PDF pages to images
- Text To PDF - Create a PDF from text content
When to Use This Package
- Document Processing: Extract data from PDF documents
- Document Assembly: Merge multiple PDFs into one
- Archival: Split large PDFs into smaller files
- Security: Protect sensitive PDFs with passwords
- Format Conversion: Convert PDFs for other processing
Typical Workflow
- Extract Text to get content for processing
- Or Extract Images for image-based data
- Merge to combine related documents
- Split to separate multi-page documents
- Encrypt to protect before sharing
Use Cases
- Extract invoice data from PDF documents
- Merge daily reports into monthly compilations
- Split large PDFs for email attachments
- Add password protection to sensitive documents
- Convert PDFs to images for OCR processing
📄️ Decrypt
Robomotion.PDFBox.Decrypt
📄️ Encrypt
Robomotion.PDFBox.Encrypt
📄️ Extract Images
Robomotion.PDFBox.ExtractImages
📄️ Extract Text
Robomotion.PDFBox.ExtractText
📄️ Merge
Robomotion.PDFBox.Merge
📄️ PDF to Image
Robomotion.PDFBox.PdfToImage
📄️ Split
Robomotion.PDFBox.Split
📄️ Text to PDF
Robomotion.PDFBox.TextToPdf