Skip to main content

Analyze Document

Analyzes documents using Claude AI with Beta Files API support, enabling Claude to read and understand various document formats including PDFs, Word documents, and more.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Connection Id - The Claude client session identifier from Connect node (optional if API Key is provided directly).
  • System Prompt - System instructions to guide Claude's behavior when analyzing documents.
  • User Prompt - Question or instruction about what you want to know about the document(s). For example: "Summarize this document" or "Extract all dates mentioned".
  • File Paths - Paths to document files (PDF, Word, etc.) you want Claude to analyze. You can add multiple files.
  • Custom File Paths - Array of file paths from message scope (e.g., msg.file_paths) for batch processing of documents.

Options

Authentication

  • API Key - Claude API key (optional if using Connection ID).
  • Use Robomotion AI Credits - Not supported for document analysis. This feature requires direct API mode with your own API key.

Model Selection

  • Model - Select which Claude model to use. Options include:
    • Claude Opus 4.5 - Best for complex document analysis
    • Claude Opus 4 - Highly capable for detailed analysis
    • Claude Sonnet 4.5 - Latest balanced model
    • Claude Sonnet 4 - Balanced performance and speed (default)
    • Claude 3.7 Sonnet - Latest 3.x generation Sonnet
    • Claude 3.5 Sonnet - Previous generation Sonnet
    • Claude 3.5 Haiku - Fastest model for simple documents
    • Custom Model - Specify your own model name
  • Custom Model - Enter custom model name when Custom Model is selected.

Generation Settings

  • Max Tokens - Maximum tokens for the analysis response (default: 4096). Increase for longer, more detailed analyses.
  • Temperature - Controls randomness (0.0-1.0). Use lower values (0.1-0.3) for factual document analysis.
  • Top P - Nucleus sampling parameter (0.0-1.0).
  • Top K - Top-k sampling parameter (1-100).

Extended Thinking

  • Thinking Mode - Extended thinking allows Claude to reason through complex documents:
    • Off - No extended thinking (default)
    • Auto (Budget: 10240) - Automatic thinking with default token budget
    • Custom Budget - Specify your own thinking token budget
  • Thinking Budget - Custom thinking token budget (1024-128000). Only used when Thinking Mode is Custom.

Advanced

  • Timeout (seconds) - Request timeout in seconds (default: 300 for file upload). Increase for large documents.
  • Include Raw Response - Include full API response in output (default: false).
  • Keep File After Analysis - Whether to keep the uploaded file on Claude's servers after analysis (default: false).

Outputs

  • Text - Claude's analysis of the document(s).
  • Thinking - Extended thinking output when thinking mode is enabled.
  • File IDs - Uploaded file IDs that can be reused in subsequent requests (useful for multiple analyses of the same document).
  • Raw Response - Complete API response object (when Include Raw Response is enabled).

How It Works

The Analyze Document node uses Claude's Beta Files API to analyze documents. When executed, the node:

  1. Validates the connection (requires direct API mode, not Robomotion credits)
  2. Collects all file paths from both individual and array inputs
  3. For each file:
    • Opens and reads the file
    • Detects the MIME type based on file extension
    • Uploads the file to Claude's Files API
    • Receives a file ID for the uploaded file
  4. Creates a message with the user prompt and uploaded documents
  5. Sends the request to Claude's Beta Messages API
  6. Extracts Claude's analysis from the response
  7. Returns the analysis, file IDs, and optional thinking output
  8. Optionally cleans up uploaded files (Note: Claude automatically cleans up files after some time)

Requirements

  • Direct API mode - Robomotion AI Credits are not supported for document analysis
  • Valid Connection Id or direct API Key credentials
  • At least one file path must be provided
  • Non-empty User Prompt
  • Supported document formats (based on file extension)

Supported File Formats

The node automatically detects the MIME type based on file extension. Supported formats include:

Documents:

  • PDF (.pdf)
  • Text files (.txt, .md, .csv)
  • HTML/Web (.html, .css, .js, .ts)
  • Code files (.py, .json, .xml)

Images:

  • PNG (.png), JPEG (.jpg, .jpeg), WebP (.webp), GIF (.gif)
  • HEIC (.heic), HEIF (.heif)
  • BMP (.bmp), TIFF (.tiff), SVG (.svg)

Audio:

  • WAV (.wav), MP3 (.mp3), AIFF (.aiff), AAC (.aac)
  • OGG (.ogg), FLAC (.flac)

Video:

  • MP4 (.mp4), MPEG (.mpeg, .mpg), MOV (.mov)
  • AVI (.avi), MKV (.mkv), WebM (.webm)
  • WMV (.wmv), 3GP (.3gp)

Error Handling

The node will return specific errors in the following cases:

  • Using Robomotion AI Credits mode - "File upload is not supported with Robomotion AI Credits"
  • Empty or missing User Prompt
  • No file paths provided
  • Invalid Connection Id
  • Failed to open or read file
  • File upload failure
  • Empty Custom Model name when Custom Model is selected
  • Temperature out of range (must be 0.0-1.0)
  • Top P out of range (must be 0.0-1.0)
  • Top K less than 1
  • Thinking budget out of range (must be 1024-128000)
  • API authentication errors (401)
  • API rate limit errors (429)
  • API service errors (500, 503)

Usage Notes

File Upload

  • Files are uploaded to Claude's secure servers before analysis
  • File IDs can be reused for multiple analyses of the same document
  • Files are automatically cleaned up by Claude after some time
  • Large files may require longer timeout values

Document Analysis

  • Be specific in your User Prompt about what you want to extract or analyze
  • Claude can handle multiple documents in a single request
  • For multi-document analysis, Claude can compare and cross-reference content

Model Selection

  • Use Opus models for complex, detailed document analysis
  • Use Sonnet for balanced performance on most documents
  • Use Haiku for quick analysis of simple documents

Extended Thinking

  • Enable for complex document analysis requiring reasoning
  • Useful for legal documents, technical papers, or financial reports
  • The thinking output shows Claude's analytical process

Examples

Example 1: PDF Invoice Analysis

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Extract the invoice number, date, total amount, and all line items from this invoice"
  • File Paths: ["/path/to/invoice.pdf"]
  • System Prompt: "You are a document processing assistant. Extract information accurately and format it clearly."

Configuration:

  • Model: Claude Sonnet 4
  • Max Tokens: 2000
  • Temperature: 0.1 (for accuracy)

Output: Claude will extract all requested information from the PDF invoice in a structured format.


Example 2: Contract Review with Extended Thinking

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Review this contract and identify any unusual clauses, potential risks, or missing standard terms"
  • File Paths: ["/path/to/contract.pdf"]
  • System Prompt: "You are a legal document analyst. Be thorough and highlight important details."

Configuration:

  • Model: Claude Opus 4.5
  • Max Tokens: 4096
  • Thinking Mode: Auto
  • Temperature: 0.2

Outputs:

  • Text: Detailed analysis of the contract
  • Thinking: Claude's reasoning process for identifying risks

Example 3: Batch Document Processing

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Summarize each document and identify the main topics discussed"
  • Custom File Paths: msg.document_paths (array containing multiple file paths)

Configuration:

  • Model: Claude Sonnet 4
  • Max Tokens: 3000

Output: Claude will analyze all documents and provide summaries and topic identification for each.


Example 4: Resume Screening

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Extract: candidate name, years of experience, technical skills, education, and previous employers. Rate the candidate's fit for a senior software engineer role."
  • File Paths: ["/path/to/resume.pdf"]

Configuration:

  • Model: Claude Sonnet 4
  • Max Tokens: 1500
  • Temperature: 0.3

Output: Structured extraction of resume information with a qualification assessment.


Example 5: Multi-Document Comparison

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Compare these three versions of the policy document and identify all changes between them"
  • File Paths: ["/path/to/policy_v1.pdf", "/path/to/policy_v2.pdf", "/path/to/policy_v3.pdf"]

Configuration:

  • Model: Claude Opus 4.5
  • Max Tokens: 5000
  • Temperature: 0.2

Output: Detailed comparison showing changes across all versions.


Example 6: Financial Report Analysis

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Analyze this financial report and provide: revenue trends, expense breakdown, profit margins, and key financial ratios. Highlight any concerning patterns."
  • File Paths: ["/path/to/financial_report.pdf"]
  • System Prompt: "You are a financial analyst. Provide quantitative analysis with specific numbers."

Configuration:

  • Model: Claude Opus 4
  • Max Tokens: 4000
  • Thinking Mode: Auto
  • Temperature: 0.2

Output: Comprehensive financial analysis with insights and concerns.


Example 7: Technical Documentation Understanding

Inputs:

  • Connection Id: (from Connect node)
  • User Prompt: "Create a step-by-step setup guide based on this technical documentation. Simplify complex terms for beginners."
  • File Paths: ["/path/to/technical_docs.pdf"]

Configuration:

  • Model: Claude Sonnet 4
  • Max Tokens: 3000
  • Temperature: 0.4

Output: Beginner-friendly setup guide derived from technical documentation.


Example 8: Reusing File IDs

First Analysis:

  • File Paths: ["/path/to/large_document.pdf"]
  • Keep File After Analysis: true

Output:

  • File IDs: ["file-xyz123"]

Subsequent Analyses: You can reference the uploaded file by its ID in future requests without re-uploading, saving time and bandwidth.

Best Practices

  1. File Management:

    • Keep file sizes reasonable to avoid timeout issues
    • Use appropriate timeout values for large documents
    • Consider whether to keep files for reuse or let them be auto-cleaned
  2. Prompt Engineering:

    • Be specific about what information you want extracted
    • Provide context about the document type
    • Ask for structured output when needed (JSON, tables, lists)
  3. Model Selection:

    • Use Opus for complex documents requiring deep understanding
    • Use Sonnet for general document analysis
    • Use Haiku for simple text extraction
  4. Accuracy:

    • Use low temperature (0.1-0.3) for factual extraction
    • Use extended thinking for complex analytical tasks
    • Provide clear System Prompts for consistent results
  5. Batch Processing:

    • Use Custom File Paths for processing multiple documents
    • Consider processing files in batches to avoid timeouts
    • Handle errors gracefully when processing many files
  6. Performance:

    • Increase timeout for large or complex documents
    • Monitor token usage with Max Tokens setting
    • Reuse file IDs when analyzing the same document multiple times
  7. Error Handling:

    • Validate file paths before processing
    • Check file formats are supported
    • Implement retry logic for transient failures
  8. Security:

    • Remember that files are uploaded to Claude's servers
    • Don't upload highly sensitive documents unless necessary
    • Files are automatically cleaned up after some time
  9. Quality:

    • Ensure document quality is good (clear scans, readable PDFs)
    • For scanned documents, consider OCR preprocessing if needed
    • Test with sample documents before batch processing
  10. Cost Management:

    • Be mindful of token usage with large documents
    • Use appropriate models for the task complexity
    • Consider document size when setting Max Tokens