Skip to main content

PDFBox

Manipulate PDF documents - extract text and images, merge, split, encrypt, and convert PDFs.

Overview

The PDFBox package provides comprehensive PDF manipulation capabilities using Apache PDFBox. Use it when you need to extract content from PDFs, merge or split documents, add security, or convert between PDF and other formats.

Key Features

  • Text Extraction - Extract text content from PDFs
  • Image Extraction - Extract embedded images from PDFs
  • Merge/Split - Combine or separate PDF documents
  • Encryption - Add or remove PDF password protection
  • Conversion - Convert PDF to images or text to PDF

Available Nodes

  • Extract Text - Extract all text content from a PDF
  • Extract Images - Extract all images from a PDF
  • Merge - Combine multiple PDFs into one
  • Split - Split a PDF into separate pages
  • Encrypt - Add password protection to a PDF
  • Decrypt - Remove password protection from a PDF
  • PDF To Image - Convert PDF pages to images
  • Text To PDF - Create a PDF from text content

When to Use This Package

  • Document Processing: Extract data from PDF documents
  • Document Assembly: Merge multiple PDFs into one
  • Archival: Split large PDFs into smaller files
  • Security: Protect sensitive PDFs with passwords
  • Format Conversion: Convert PDFs for other processing

Typical Workflow

  1. Extract Text to get content for processing
  2. Or Extract Images for image-based data
  3. Merge to combine related documents
  4. Split to separate multi-page documents
  5. Encrypt to protect before sharing

Use Cases

  • Extract invoice data from PDF documents
  • Merge daily reports into monthly compilations
  • Split large PDFs for email attachments
  • Add password protection to sensitive documents
  • Convert PDFs to images for OCR processing