Extract Key Values

Extracts key-value pairs from forms and structured documents using Google Document AI's form parsing capabilities, automatically identifying field labels and their corresponding values for intelligent data extraction.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

File Path - The local file path of the document to process. Supports PDF, PNG, JPG, JPEG, TIFF, GIF, BMP, and WEBP formats.
MIME Type - The MIME type of the document (e.g., application/pdf, image/png, image/jpeg). If left empty, the MIME type will be automatically detected from the file content.

Output

Pages - An array of page objects, where each object contains:
- page (number): The page number (starting from 1)
- key_value_pairs (array): Array of key-value pair objects, where each object contains:
  - key (string): The field label or form field name
  - value (string): The corresponding value or filled-in content

Options

Credentials - Service Account credentials for Document AI API authentication. Select from Robomotion vault or provide the JSON key file content.
Project Id - The Google Cloud project ID where Document AI is enabled (e.g., "my-project-123").
Location - The processor location/region. Default is "us". Available options: "us", "eu", "asia". Choose the region where your processor is deployed.
Processor Id - The Document AI processor ID to use for form parsing (e.g., "a1b2c3d4e5f6g7h8"). Found in Google Cloud Console under Document AI processors.

How It Works

The Extract Key Values node integrates with Google Document AI to extract form fields from documents. When executed, the node:

Validates the provided file path and checks file accessibility
Detects MIME type automatically if not specified
Authenticates with Google Document AI using the provided Service Account credentials
Reads the document file content from the local file system
Sends the document to the specified Document AI processor
Processes the document using form parsing machine learning models
Identifies form fields, labels, and their spatial relationships
Extracts key-value pairs by matching field labels to their values
Handles checkboxes, text fields, and various form elements
Organizes results by page with structured key-value pairs
Returns extracted data ready for processing or database storage

Requirements

Google Cloud Setup:
- Active Google Cloud project with billing enabled
- Document AI API enabled in the project
- Form Parser processor created and deployed in Document AI
- Service Account with Document AI User role
Robomotion Setup:
- Service Account JSON key stored in Robomotion vault
- Document file accessible on the local file system
- File size under 20MB limit

Practical Examples

Example 1: Extract Invoice Fields

// Extract key fields from an invoice
// Outputs: Pages array with key-value pairs in $pages

$pages.forEach(page => {
  console.log(`Processing Page ${page.page}`);

  // Convert array to object for easier access
  const fields = {};
  page.key_value_pairs.forEach(pair => {
    fields[pair.key] = pair.value;
  });

  // Extract common invoice fields
  const invoiceNumber = fields['Invoice Number'] || fields['Invoice #'];
  const invoiceDate = fields['Date'] || fields['Invoice Date'];
  const dueDate = fields['Due Date'] || fields['Payment Due'];
  const total = fields['Total'] || fields['Amount Due'];
  const vendor = fields['Vendor'] || fields['From'];

  console.log('Invoice Details:');
  console.log(`  Number: ${invoiceNumber}`);
  console.log(`  Date: ${invoiceDate}`);
  console.log(`  Due Date: ${dueDate}`);
  console.log(`  Total: ${total}`);
  console.log(`  Vendor: ${vendor}`);
});

Example 2: Process Application Form

// Extract data from application form
const applicantData = {};

$pages.forEach(page => {
  page.key_value_pairs.forEach(pair => {
    // Normalize key names
    const key = pair.key.trim().toLowerCase().replace(/[:\s]+/g, '_');
    applicantData[key] = pair.value.trim();
  });
});

// Access extracted data
console.log('Applicant Information:');
console.log(`Name: ${applicantData.full_name || applicantData.name}`);
console.log(`Email: ${applicantData.email || applicantData.email_address}`);
console.log(`Phone: ${applicantData.phone || applicantData.phone_number}`);
console.log(`Address: ${applicantData.address}`);
console.log(`Date of Birth: ${applicantData.date_of_birth || applicantData.dob}`);

// Validate required fields
const requiredFields = ['name', 'email', 'phone'];
const missing = requiredFields.filter(field => !applicantData[field]);

if (missing.length > 0) {
  console.warn(`Missing required fields: ${missing.join(', ')}`);
}

Example 3: Save to Database

// Extract form data and save to database
const db = require('./database'); // Your database module

for (const page of $pages) {
  const record = {
    page_number: page.page,
    extracted_at: new Date(),
    fields: {}
  };

  // Build fields object
  page.key_value_pairs.forEach(pair => {
    record.fields[pair.key] = pair.value;
  });

  // Save to database
  await db.insert('form_submissions', record);
  console.log(`Saved page ${page.page} to database`);
}

Example 4: Compare Form Values

// Compare extracted values against expected values
const expectedValues = {
  'Status': 'Approved',
  'Reviewed By': 'John Smith',
  'Signature': 'Yes'
};

const discrepancies = [];

$pages.forEach(page => {
  page.key_value_pairs.forEach(pair => {
    if (expectedValues[pair.key]) {
      if (pair.value !== expectedValues[pair.key]) {
        discrepancies.push({
          field: pair.key,
          expected: expectedValues[pair.key],
          actual: pair.value
        });
      }
    }
  });
});

if (discrepancies.length > 0) {
  console.warn('Found discrepancies:');
  discrepancies.forEach(d => {
    console.log(`  ${d.field}: expected "${d.expected}", got "${d.actual}"`);
  });
} else {
  console.log('All values match expected values');
}

Example 5: Generate Summary Report

// Generate summary of all extracted key-value pairs
let report = 'Extracted Form Data\n';
report += '==================\n\n';

$pages.forEach(page => {
  report += `Page ${page.page}:\n`;
  report += '-'.repeat(50) + '\n';

  if (page.key_value_pairs.length === 0) {
    report += '  No fields found\n';
  } else {
    page.key_value_pairs.forEach(pair => {
      const key = pair.key.padEnd(30);
      report += `  ${key}: ${pair.value}\n`;
    });
  }

  report += '\n';
});

// Save report to file
const fs = require('fs');
fs.writeFileSync('extraction_report.txt', report);
console.log('Report saved to extraction_report.txt');

Example 6: Handle Multi-Page Forms

// Combine data from multi-page form
const combinedData = {};

$pages.forEach(page => {
  console.log(`Processing page ${page.page}/${$pages.length}`);

  page.key_value_pairs.forEach(pair => {
    // Handle duplicate keys across pages
    if (combinedData[pair.key]) {
      // If key exists, convert to array or append
      if (Array.isArray(combinedData[pair.key])) {
        combinedData[pair.key].push(pair.value);
      } else {
        combinedData[pair.key] = [combinedData[pair.key], pair.value];
      }
    } else {
      combinedData[pair.key] = pair.value;
    }
  });
});

console.log('Combined form data:', combinedData);

Tips for Effective Use

Form Design

Clear Labels: Use clear, distinct labels for form fields
Consistent Layout: Maintain consistent spacing between labels and values
Standard Fonts: Use standard, readable fonts (avoid decorative fonts)
Label Proximity: Keep labels close to their corresponding value fields

Key Matching

Keys are extracted as-is from the document
Handle variations in label text (e.g., "Phone" vs "Phone Number")
Use case-insensitive matching when searching for specific fields
Trim whitespace from keys and values
Normalize punctuation in field names

Checkbox Handling

Checkboxes are detected and values extracted (often "Yes"/"No" or checked/unchecked)
Handle boolean conversions from text values
Consider common checkbox representations ("X", "☑", "checked", etc.)

Multi-Page Forms

Process all pages to capture complete form data
Handle fields that span multiple pages
Combine or aggregate data appropriately
Track which page each field came from if needed

Data Validation

Required Fields: Check for presence of required fields
Format Validation: Validate dates, emails, phone numbers, etc.
Value Ranges: Verify numeric values are within expected ranges
Cross-Field Validation: Ensure related fields are consistent

Common Errors and Solutions

Error: "File path cannot be empty"

Cause: No file path provided to the node. Solution: Ensure the File Path input is populated with a valid path.

Error: "Failed to read document file"

Cause: File not found, permission denied, or path is incorrect. Solution:

Verify the file exists at the specified path
Check file permissions allow reading
Use absolute paths instead of relative paths
Ensure the file hasn't been moved or deleted

Error: "Project ID cannot be empty"

Cause: Project ID option is not configured. Solution: Add your Google Cloud project ID in the node options.

Error: "Processor ID cannot be empty"

Cause: Processor ID option is not configured. Solution:

Create a Form Parser processor in Google Cloud Console
Copy the processor ID from the processor details page
Add it to the node options

Error: "Invalid credentials format: missing content field"

Cause: Credentials are not properly formatted or stored. Solution:

Re-download Service Account JSON key from Google Cloud Console
Save the complete JSON content in Robomotion vault
Select the correct credential from the dropdown

No key-value pairs found

Cause: Document doesn't contain detectable form fields or layout isn't recognized. Solution:

Verify document contains actual form fields (labels with associated values)
Ensure labels and values are properly aligned and spaced
Check document quality (not blurry or skewed)
Try using a specialized processor for the document type
Consider using Extract Text if structure is too complex

Incorrect key-value pairing

Cause: Spatial relationship between label and value not detected correctly. Solution:

Ensure consistent spacing between labels and values
Avoid complex layouts with multiple columns
Use clear visual separation between fields
Check for proper alignment of labels and values
Manually map fields if automated extraction is inconsistent

Missing values for some fields

Cause: Empty fields or low-contrast text not detected. Solution:

Verify all fields are filled in the original document
Check for sufficient contrast between text and background
Ensure handwritten values are clear and legible
Increase scan resolution for better detection
Use default values for optional empty fields

Use Cases

Invoice Processing

Extract vendor information, invoice numbers, dates, amounts, and payment terms from invoices for automated accounts payable workflows.

Form Digitization

Convert paper forms (applications, registrations, surveys) to structured digital data for database entry and processing.

Insurance Claims

Extract policy numbers, claim dates, claimant information, and claim details from insurance claim forms for automated processing.

Medical Records

Extract patient information, medical history, diagnoses, and prescriptions from medical forms and health records.

Tax Documents

Extract taxpayer information, income details, deductions, and other data from tax forms (W2, 1099, etc.).

Loan Applications

Extract applicant details, employment information, financial data, and supporting documentation from loan application forms.

Compliance Documents

Extract required information from regulatory forms, certifications, and compliance documentation for auditing and reporting.

Contract Data

Extract key terms, parties, dates, amounts, and conditions from contracts and agreements for contract management systems.

Output Structure Example

// Example output structure
{
  pages: [
    {
      page: 1,
      key_value_pairs: [
        {
          key: 'Invoice Number',
          value: 'INV-2024-001'
        },
        {
          key: 'Date',
          value: '01/15/2024'
        },
        {
          key: 'Total Amount',
          value: '$1,250.00'
        },
        {
          key: 'Payment Terms',
          value: 'Net 30'
        },
        {
          key: 'Vendor Name',
          value: 'Acme Corporation'
        }
      ]
    }
  ]
}

Field Name Normalization

Consider normalizing field names for consistency:

// Normalize field names
function normalizeFieldName(name) {
  return name
    .toLowerCase()
    .trim()
    .replace(/[:\s]+/g, '_')
    .replace(/[^a-z0-9_]/g, '');
}

// Use normalized names
const normalizedData = {};
$pages.forEach(page => {
  page.key_value_pairs.forEach(pair => {
    const normalizedKey = normalizeFieldName(pair.key);
    normalizedData[normalizedKey] = pair.value;
  });
});

Performance Considerations

Processing Time: Typically 2-10 seconds per page depending on form complexity
Field Count: Documents with many fields take slightly longer to process
Layout Complexity: Complex multi-column layouts may impact accuracy
Concurrent Requests: Implement queuing for batch processing to respect rate limits
Network Latency: Choose processor location near your automation deployment
Cost: Charged per page processed; monitor usage in Google Cloud Console

Common Properties​

Inputs​

Output​

Options​

How It Works​

Requirements​

Practical Examples​

Example 1: Extract Invoice Fields​

Example 2: Process Application Form​

Example 3: Save to Database​

Example 4: Compare Form Values​

Example 5: Generate Summary Report​

Example 6: Handle Multi-Page Forms​

Tips for Effective Use​

Form Design​

Key Matching​

Checkbox Handling​

Multi-Page Forms​

Data Validation​

Common Errors and Solutions​

Error: "File path cannot be empty"​

Error: "Failed to read document file"​

Error: "Project ID cannot be empty"​

Error: "Processor ID cannot be empty"​

Error: "Invalid credentials format: missing content field"​

No key-value pairs found​

Incorrect key-value pairing​

Missing values for some fields​

Use Cases​

Invoice Processing​

Form Digitization​

Insurance Claims​

Medical Records​

Tax Documents​

Loan Applications​

Compliance Documents​

Contract Data​

Output Structure Example​

Field Name Normalization​

Performance Considerations​