Extract Table
Extracts tabular data from HTML content and converts it to structured JSON format.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
info
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- HTML Element - The HTML content containing table elements to extract.
Options
- Vertical - When enabled, extracts vertical tables where data is organized in key-value pairs rather than traditional horizontal rows.
Output
- Table - A JSON object containing the extracted table data with the following structure:
- columns - An array of column headers
- rows - An array of row objects, where each object represents a row with column names as keys
How It Works
The Extract Table node parses HTML content and extracts table data into a structured JSON format. When executed, the node:
- Retrieves the HTML Element input variable
- Validates that the HTML content is not empty
- Checks the Vertical option to determine extraction method:
- Horizontal mode (default) - For traditional tables with headers in thead and data in tbody:
- Extracts column headers from thead th elements
- Extracts row data from tbody tr elements
- Maps cell data to corresponding column headers
- Vertical mode - For tables organized as key-value pairs:
- Extracts key-value pairs from tbody tr elements
- Treats the first cell in each row as the key
- Treats subsequent cells as values
- Horizontal mode (default) - For traditional tables with headers in thead and data in tbody:
- Processes the HTML using goquery to parse DOM elements
- Constructs a TableData structure with columns and rows
- Sets the structured table data as the output variable
Requirements
- Valid HTML content containing table elements
- Non-empty HTML Element input
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid HTML Element input - "HTML Element input cannot be empty"
- Malformed HTML that cannot be parsed
- Issues with goquery document creation
Usage Notes
- Supports both traditional horizontal tables and vertical key-value tables
- For horizontal tables, looks for thead th elements for column headers
- For horizontal tables, looks for tbody tr elements for row data
- For vertical tables, treats the first cell in each row as a key and subsequent cells as values
- Returns empty columns and rows arrays if no table data is found
- Handles tables with missing cells gracefully
- Column headers are used as keys in the row objects
- The output JSON structure makes it easy to process table data in subsequent nodes
- Useful for web scraping, data extraction, and content analysis tasks
- Can handle nested HTML elements within table cells
- The Vertical option should be enabled when working with definition lists or key-value pair tables