Extract Images

Extracts all images from a PDF file and saves them to a specified directory, useful for extracting visual content from documents.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

PDF Path - Path to the PDF file from which to extract images.
Directory to Extract Images - Directory path where extracted images will be saved.

Options

From All Pages - When enabled, extracts images from all pages in the PDF. Default is false.
From Selected Pages - Specifies a page range to extract images from (e.g., "2-5" or "1,3,5"). Required when "From All Pages" is disabled.

note

You must either enable From All Pages or provide a value for From Selected Pages.

Output

This node does not produce any output variables. Extracted images are saved to the specified directory.

How It Works

The Extract Images node retrieves all images embedded in a PDF file. When executed, the node:

Validates the PDF path and output directory
Determines which pages to process based on the options
Scans the specified pages for embedded images
Extracts each image in its original format
Saves the images to the output directory with auto-generated filenames

Use Cases

Content Extraction: Extract images from reports or presentations
Data Processing: Extract charts, graphs, or diagrams for analysis
Asset Recovery: Retrieve images from old or archived PDF documents
Image Cataloging: Build an image library from PDF collections
OCR Preparation: Extract images for separate OCR processing
Quality Assurance: Extract and verify images in automated PDF validation

Page Range Format

When using From Selected Pages, you can specify:

Single page: "3"
Page range: "2-5" (extracts from pages 2, 3, 4, and 5)
Multiple ranges: "1-3,7-9" (extracts from pages 1, 2, 3, 7, 8, and 9)
Mixed format: "1,3,5-7,10" (extracts from pages 1, 3, 5, 6, 7, and 10)
Open-ended range: "5-" (extracts from page 5 to the end)

Extracted File Naming

Extracted images are automatically named by the PDF processor, typically including:

Page number reference
Image sequence number on that page
Original image format extension (jpg, png, etc.)

Error Handling

The node will return specific errors in the following cases:

Empty or invalid PDF path
PDF file not found at the specified path
Empty or invalid output directory path
Output directory does not exist or is not writable
"From Selected Pages" is empty when "From All Pages" is disabled
Invalid page range format
Specified page numbers exceed the PDF page count
PDF file is encrypted or password-protected
Insufficient permissions to read the PDF file or write to the output directory

Usage Notes

The output directory must exist before running the node
Images are extracted in their original format and quality
Not all PDFs contain extractable images
Some images may be embedded as vector graphics and won't be extracted
Scanned PDFs with images as page backgrounds will extract the full page images
Multiple images on the same page are extracted separately
Image quality depends on how they were embedded in the PDF
For encrypted PDFs, decrypt the file first using the Decrypt node

Tips for Effective Use

Create Output Directory First: Use File System nodes to ensure the output directory exists
Test Page Range: Start with a small page range to verify extraction works correctly
Check Results: Verify extracted images before processing large batches
Handle Empty Results: Not all PDF pages contain images; check the output directory
Naming Strategy: Process extracted images immediately or rename them systematically
Large Documents: For PDFs with many pages, consider processing in batches
Format Preservation: Images maintain their original format (JPEG, PNG, etc.)

Example Workflows

Extract All Images

Use Extract Images with "From All Pages" enabled
Process all extracted images in the output directory
Optionally rename or move images to organized folders

Selective Extraction

Identify pages with relevant images
Use Extract Images with specific page range like "5-10"
Process only the extracted images from those pages

Batch Processing

Loop through multiple PDF files
Extract images from each to separate directories
Catalog or process extracted images

Split: Split PDFs before extracting images from specific sections
Decrypt: Decrypt password-protected PDFs before extraction

Common Properties​

Inputs​

Options​

Output​

How It Works​

Use Cases​

Page Range Format​

Extracted File Naming​

Error Handling​

Usage Notes​

Tips for Effective Use​

Example Workflows​

Extract All Images​

Selective Extraction​

Batch Processing​

Related Nodes​