Skip to main content

DOM Parser

Parse and extract data from HTML documents using CSS selectors and XPath expressions.

Overview

The DOM Parser package provides powerful HTML parsing capabilities for extracting structured data from web pages. Use it when you need to scrape data from HTML content, extract tables, find specific elements, or process HTML documents offline.

Key Features

  • Element Selection - Find elements using CSS selectors
  • Table Extraction - Parse HTML tables into structured data
  • Text Extraction - Get text content from elements
  • Image Extraction - Find and extract image URLs
  • Attribute Access - Get element attribute values

Available Nodes

  • Find - Find a single element matching a selector
  • Find All - Find all elements matching a selector
  • Get Value - Get a single value from an element
  • Get Values - Get multiple values from matched elements
  • Extract Text - Extract text content from HTML
  • Extract Table - Parse HTML tables into arrays
  • Extract Image - Get image sources from elements
  • Count Words - Count words in HTML content
  • Escape - HTML encode special characters
  • Unescape - Decode HTML entities

When to Use This Package

  • Web Scraping: Extract data from downloaded HTML pages
  • Email Parsing: Process HTML emails to extract data
  • Report Processing: Parse HTML reports into data
  • Content Analysis: Analyze HTML structure and content
  • Data Migration: Extract data from HTML exports

Selector Types

  • CSS Selectors: div.className, #elementId, table tr td
  • XPath: //div[@class='content'], //table//tr

Typical Workflow

  1. Get HTML content (from Browser, HTTP request, or file)
  2. Use Find or Find All to locate target elements
  3. Use Extract Text, Get Value, or Extract Table to get data
  4. Process extracted data in your workflow