Extract Images
Extracts all images from a PDF file and saves them to a specified directory, useful for extracting visual content from documents.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- PDF Path - Path to the PDF file from which to extract images.
- Directory to Extract Images - Directory path where extracted images will be saved.
Options
- From All Pages - When enabled, extracts images from all pages in the PDF. Default is false.
- From Selected Pages - Specifies a page range to extract images from (e.g., "2-5" or "1,3,5"). Required when "From All Pages" is disabled.
note
You must either enable From All Pages or provide a value for From Selected Pages.
Output
This node does not produce any output variables. Extracted images are saved to the specified directory.
How It Works
The Extract Images node retrieves all images embedded in a PDF file. When executed, the node:
- Validates the PDF path and output directory
- Determines which pages to process based on the options
- Scans the specified pages for embedded images
- Extracts each image in its original format
- Saves the images to the output directory with auto-generated filenames
Use Cases
- Content Extraction: Extract images from reports or presentations
- Data Processing: Extract charts, graphs, or diagrams for analysis
- Asset Recovery: Retrieve images from old or archived PDF documents
- Image Cataloging: Build an image library from PDF collections
- OCR Preparation: Extract images for separate OCR processing
- Quality Assurance: Extract and verify images in automated PDF validation
Page Range Format
When using From Selected Pages, you can specify:
- Single page:
"3" - Page range:
"2-5"(extracts from pages 2, 3, 4, and 5) - Multiple ranges:
"1-3,7-9"(extracts from pages 1, 2, 3, 7, 8, and 9) - Mixed format:
"1,3,5-7,10"(extracts from pages 1, 3, 5, 6, 7, and 10) - Open-ended range:
"5-"(extracts from page 5 to the end)
Extracted File Naming
Extracted images are automatically named by the PDF processor, typically including:
- Page number reference
- Image sequence number on that page
- Original image format extension (jpg, png, etc.)
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid PDF path
- PDF file not found at the specified path
- Empty or invalid output directory path
- Output directory does not exist or is not writable
- "From Selected Pages" is empty when "From All Pages" is disabled
- Invalid page range format
- Specified page numbers exceed the PDF page count
- PDF file is encrypted or password-protected
- Insufficient permissions to read the PDF file or write to the output directory
Usage Notes
- The output directory must exist before running the node
- Images are extracted in their original format and quality
- Not all PDFs contain extractable images
- Some images may be embedded as vector graphics and won't be extracted
- Scanned PDFs with images as page backgrounds will extract the full page images
- Multiple images on the same page are extracted separately
- Image quality depends on how they were embedded in the PDF
- For encrypted PDFs, decrypt the file first using the Decrypt node
Tips for Effective Use
- Create Output Directory First: Use File System nodes to ensure the output directory exists
- Test Page Range: Start with a small page range to verify extraction works correctly
- Check Results: Verify extracted images before processing large batches
- Handle Empty Results: Not all PDF pages contain images; check the output directory
- Naming Strategy: Process extracted images immediately or rename them systematically
- Large Documents: For PDFs with many pages, consider processing in batches
- Format Preservation: Images maintain their original format (JPEG, PNG, etc.)
Example Workflows
Extract All Images
- Use Extract Images with "From All Pages" enabled
- Process all extracted images in the output directory
- Optionally rename or move images to organized folders
Selective Extraction
- Identify pages with relevant images
- Use Extract Images with specific page range like "5-10"
- Process only the extracted images from those pages
Batch Processing
- Loop through multiple PDF files
- Extract images from each to separate directories
- Catalog or process extracted images
Related Nodes
- Split: Split PDFs before extracting images from specific sections
- Decrypt: Decrypt password-protected PDFs before extraction