Skip to main content

Extract Table

Extracts tabular data from HTML content and converts it to structured JSON format.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • HTML Element - The HTML content containing table elements to extract.

Options

  • Vertical - When enabled, extracts vertical tables where data is organized in key-value pairs rather than traditional horizontal rows.

Output

  • Table - A JSON object containing the extracted table data with the following structure:
    • columns - An array of column headers
    • rows - An array of row objects, where each object represents a row with column names as keys

How It Works

The Extract Table node parses HTML content and extracts table data into a structured JSON format. When executed, the node:

  1. Retrieves the HTML Element input variable
  2. Validates that the HTML content is not empty
  3. Checks the Vertical option to determine extraction method:
    • Horizontal mode (default) - For traditional tables with headers in thead and data in tbody:
      • Extracts column headers from thead th elements
      • Extracts row data from tbody tr elements
      • Maps cell data to corresponding column headers
    • Vertical mode - For tables organized as key-value pairs:
      • Extracts key-value pairs from tbody tr elements
      • Treats the first cell in each row as the key
      • Treats subsequent cells as values
  4. Processes the HTML using goquery to parse DOM elements
  5. Constructs a TableData structure with columns and rows
  6. Sets the structured table data as the output variable

Requirements

  • Valid HTML content containing table elements
  • Non-empty HTML Element input

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid HTML Element input - "HTML Element input cannot be empty"
  • Malformed HTML that cannot be parsed
  • Issues with goquery document creation

Usage Notes

  • Supports both traditional horizontal tables and vertical key-value tables
  • For horizontal tables, looks for thead th elements for column headers
  • For horizontal tables, looks for tbody tr elements for row data
  • For vertical tables, treats the first cell in each row as a key and subsequent cells as values
  • Returns empty columns and rows arrays if no table data is found
  • Handles tables with missing cells gracefully
  • Column headers are used as keys in the row objects
  • The output JSON structure makes it easy to process table data in subsequent nodes
  • Useful for web scraping, data extraction, and content analysis tasks
  • Can handle nested HTML elements within table cells
  • The Vertical option should be enabled when working with definition lists or key-value pair tables