Get Document Info

Get metadata and information about a document without fully parsing its content. This node provides a quick way to inspect document properties, check file support, and validate documents before processing.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

File Path - Path to the document file to inspect.

Output

Document Info - Object containing file metadata and properties:
- filename - Name of the file
- extension - File extension (e.g., ".pdf", ".docx")
- size - File size in bytes
- sizeFormatted - Human-readable file size (e.g., "1.5 MB")
- mimeType - MIME type of the file
- fileType - Descriptive file type (e.g., "PDF Document")
- isSupported - Whether the format is supported by the package
- (PDF only) pageCount - Number of pages
- (PDF only) title - Document title from metadata
- (PDF only) author - Document author from metadata
- (PDF only) subject - Document subject from metadata
- (PDF only) creator - PDF creator application
- (PDF only) isEncrypted - Whether the PDF is password-protected
- (DOCX only) paragraphCount - Number of paragraphs
- (DOCX only) tableCount - Number of tables
Is Supported - Boolean indicating whether the file format is supported for processing.
File Type - String describing the detected file type.

Supported File Types

The node recognizes the following file types:

Extension	File Type
.pdf	PDF Document
.docx	Word Document
.doc	Word Document (Legacy)
.pptx	PowerPoint Presentation
.ppt	PowerPoint Presentation (Legacy)
.xlsx	Excel Spreadsheet
.xls	Excel Spreadsheet (Legacy)
.txt	Plain Text
.md	Markdown
.html, .htm	HTML Document
.xml	XML Document
.json	JSON File
.csv	CSV File
.tsv	TSV File
.rtf	Rich Text Format
.odt	OpenDocument Text
.epub	EPUB Book
.eml	Email Message
.msg	Outlook Message

How It Works

The Get Document Info node provides lightweight document inspection. When executed, the node:

Validates the provided file path
Retrieves basic file system information (size, name, extension)
Determines MIME type and file format
Checks if the format is supported by the Document Processor package
For PDFs, extracts metadata including page count, title, author, and encryption status
For DOCX files, extracts paragraph and table counts
Returns comprehensive document information object

Requirements

Valid file path to any file (doesn't need to be a supported document)
Read access to the file

Error Handling

The node will return specific errors in the following cases:

Empty or missing file path
File not found at the specified path
Insufficient permissions to read the file

Note: The node will NOT error on unsupported file types; it will simply return isSupported: false.

Usage Examples

Example 1: Validate Document Before Processing

// Input
msg.filePath = "/documents/report.pdf"

// Get Document Info output
msg.documentInfo = {
  "filename": "report.pdf",
  "extension": ".pdf",
  "size": 1048576,
  "sizeFormatted": "1.0 MB",
  "mimeType": "application/pdf",
  "fileType": "PDF Document",
  "isSupported": true,
  "pageCount": 15,
  "title": "Annual Report 2024",
  "author": "John Doe",
  "isEncrypted": false
}
msg.isSupported = true
msg.fileType = "PDF Document"

// Use with Switch node to route based on support

Example 2: Filter Large Documents

// After Get Document Info

// Check file size before processing
if (msg.documentInfo.size > 10 * 1024 * 1024) { // 10 MB
  msg.isLargeFile = true;
  msg.warning = `Large file detected: ${msg.documentInfo.sizeFormatted}`;
}

// Use Switch node to handle large files differently

Example 3: Document Validation Pipeline

Get Document Info → Switch (check isSupported)
                      ├─ Yes → Read Document → Process
                      └─ No → Log Error → Skip

Example 4: Batch Document Analysis

// In a Loop over file paths
const fileList = [
  "/docs/file1.pdf",
  "/docs/file2.docx",
  "/docs/file3.txt"
];

// Get Document Info for each file
if (!msg.documentStats) msg.documentStats = [];

msg.documentStats.push({
  filename: msg.documentInfo.filename,
  type: msg.documentInfo.fileType,
  size: msg.documentInfo.sizeFormatted,
  pageCount: msg.documentInfo.pageCount || 'N/A',
  supported: msg.documentInfo.isSupported
});

// Result: Array of document statistics

Example 5: Check for Encrypted PDFs

// After Get Document Info

if (msg.documentInfo.extension === '.pdf' && msg.documentInfo.isEncrypted) {
  msg.error = `Cannot process encrypted PDF: ${msg.documentInfo.filename}`;
  msg.requiresPassword = true;
  // Route to error handler or password input flow
}

Example 6: Estimate Processing Time

// After Get Document Info

const sizeInMB = msg.documentInfo.size / (1024 * 1024);
const pageCount = msg.documentInfo.pageCount || 1;

// Rough estimation
let estimatedSeconds = 0;
if (msg.documentInfo.extension === '.pdf') {
  estimatedSeconds = pageCount * 2; // ~2 seconds per page
} else if (msg.documentInfo.extension === '.docx') {
  estimatedSeconds = (msg.documentInfo.paragraphCount || 100) * 0.01;
}

msg.estimatedProcessingTime = `~${Math.ceil(estimatedSeconds)} seconds`;

Example 7: Document Type Routing

// After Get Document Info

const extension = msg.documentInfo.extension;

if (extension === '.pdf') {
  msg.processingRoute = 'pdf-pipeline';
} else if (['.docx', '.doc'].includes(extension)) {
  msg.processingRoute = 'word-pipeline';
} else if (['.csv', '.xlsx'].includes(extension)) {
  msg.processingRoute = 'spreadsheet-pipeline';
} else if (msg.documentInfo.isSupported) {
  msg.processingRoute = 'generic-pipeline';
} else {
  msg.processingRoute = 'unsupported';
}

// Use Switch node with msg.processingRoute

Tips for Effective Use

Pre-Processing Validation:
- Always check isSupported before attempting to process documents
- Validate file size to prevent memory issues with very large files
- Check for encrypted PDFs which require password handling
Performance Optimization:
- Get Document Info is much faster than full document parsing
- Use it to filter or prioritize documents before heavy processing
- Check page count to estimate processing time
Metadata Extraction:
- PDF metadata (title, author, subject) can be useful for categorization
- Not all PDFs have metadata filled in
- DOCX metadata is more consistently available
File Type Detection:
- The node detects based on file extension and MIME type
- Extension is the primary detection method
- Renamed files may be misdetected if extension doesn't match content
Error Recovery:
- Use this node in Try-Catch blocks when processing user-uploaded files
- Check isSupported to provide user-friendly error messages
- Validate before heavy processing to save resources

Common Errors and Solutions

Error: "File path is required"

Cause: No file path was provided to the node. Solution: Ensure the File Path input is connected or set with a valid path.

Error: "File does not exist"

Cause: The specified file path is invalid or the file doesn't exist. Solution:

Verify the file path is correct
Check that the file exists at that location
Ensure the file hasn't been moved or deleted
Use absolute paths to avoid ambiguity

Warning: isSupported is false

Cause: The file format is not supported by the Document Processor package. Solution:

Check the Supported File Types table above
Convert the file to a supported format
Use alternative processing methods for unsupported formats

Issue: Missing PDF Metadata

Cause: The PDF doesn't have embedded metadata or PyMuPDF is not available. Solution:

This is normal for some PDFs (metadata is optional)
The node will still provide basic file information
pageCount may still be available through alternative methods

Issue: pageCount not available

Cause: Page count is only available for certain formats (PDF, DOCX). Solution:

This is expected for formats like TXT, MD, HTML
Use Extract Text node and estimate pages from character count if needed

Integration with Other Nodes

Document Validation Flow

Get Document Info → Switch (isSupported)
                      ├─ True → Switch (check size) → Read Document
                      └─ False → Log Warning → Skip Document

Batch Processing with Filtering

Loop (files) → Get Document Info → Switch (filter criteria) → Process Document → Collect Results

Document Classification

Get Document Info → Function (analyze metadata) → Route to Processor → Database Insert

Quality Check Pipeline

Upload File → Get Document Info → Validate (size, type, encryption) → Accept/Reject

Performance Considerations

Speed: Very fast - only reads file metadata, not full content
Memory: Minimal memory usage
Use Case: Ideal for validation before heavy processing
Batch Operations: Can quickly scan hundreds of files

Comparison with Other Nodes

Feature	Get Document Info	Read Document	Extract Text
Speed	Very Fast	Slow	Medium
Memory	Minimal	High	Medium
Full Parse	No	Yes	Yes
Metadata	Yes	Limited	No
Content	No	Structured elements	Plain text
Best For	Validation, filtering	Structure-aware processing	Text extraction

Best Practices

Always Validate: Use this node before processing unknown documents
Check File Size: Prevent memory issues by checking size first
Handle Unsupported: Provide clear feedback when isSupported is false
Leverage Metadata: Use PDF/DOCX metadata for document classification
Batch Scanning: Quickly scan large document collections for inventory
Error Prevention: Catch encrypted PDFs and unsupported formats early

Common Properties​

Inputs​

Output​

Supported File Types​

How It Works​

Requirements​

Error Handling​

Usage Examples​

Example 1: Validate Document Before Processing​

Example 2: Filter Large Documents​

Example 3: Document Validation Pipeline​

Example 4: Batch Document Analysis​

Example 5: Check for Encrypted PDFs​

Example 6: Estimate Processing Time​

Example 7: Document Type Routing​

Tips for Effective Use​

Common Errors and Solutions​

Error: "File path is required"​

Error: "File does not exist"​

Warning: isSupported is false​

Issue: Missing PDF Metadata​

Issue: pageCount not available​

Integration with Other Nodes​

Document Validation Flow​

Batch Processing with Filtering​

Document Classification​

Quality Check Pipeline​

Performance Considerations​

Comparison with Other Nodes​

Best Practices​