Skip to main content

Web Fetch

Fetches content from a URL, converts it to markdown, and processes it using an AI model.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

  • URL - string - The URL to fetch content from (required, must be valid HTTP/HTTPS URL).
  • Prompt - string - The prompt to run on the fetched content (required).

Options

  • Timeout - int - Request timeout in milliseconds (default: 30000, 30 seconds).
  • Max Size - int - Maximum content size in bytes (default: 10485760, 10MB).
  • User Agent - string - User agent string for requests (default: Mozilla/5.0 compatible).
  • Follow Redirects - bool - Follow HTTP redirects (default: true).
  • Accept Cookies - bool - Accept and send cookies (default: true).
  • Use Robomotion Proxy - bool - Use Robomotion proxy for requests (default: false).

Outputs

  • Content - string - Fetched content converted to markdown format.
  • Response - string - AI model's response about the content based on the prompt.
  • Final URL - string - Final URL after any redirects.
  • Status Code - int - HTTP status code from the request.
  • Metadata - object - Response metadata:
    • contentType - Content-Type header value
    • contentLength - Size of fetched content in bytes
    • headers - HTTP response headers
    • fetchTime - Timestamp when content was fetched
    • truncated - Whether content was truncated due to size limit

How It Works

The Web Fetch node retrieves and processes web content. When executed, the node:

  1. Validates the URL format
  2. Upgrades HTTP URLs to HTTPS automatically
  3. Creates HTTP client with specified options:
    • Sets timeout
    • Configures redirect following
    • Optionally uses proxy
  4. Sends GET request with appropriate headers:
    • User-Agent
    • Accept (HTML and XML)
    • Accept-Language
  5. Checks for cross-domain redirects:
    • If redirected to different host, returns redirect info
  6. Validates HTTP status code (200-299)
  7. Reads response body up to max size limit
  8. Converts HTML content to markdown format
  9. Processes content with AI model using the provided prompt
  10. Returns markdown content, AI response, and metadata

Requirements

  • Valid HTTP or HTTPS URL
  • Network connectivity
  • Target server accessibility
  • For proxied requests: Robomotion proxy configuration

Error Handling

The node will return specific errors in the following cases:

  • Missing URL - "URL is required"
  • Missing prompt - "Prompt is required"
  • Invalid URL - "Invalid URL: {{error}}"
  • Unsupported protocol - "Only HTTP(S) URLs are supported"
  • Client creation failed - "Failed to create HTTP client: {{error}}"
  • Request creation failed - "Failed to create request: {{error}}"
  • Fetch failed - "Failed to fetch URL: {{error}}"
  • HTTP error - "HTTP error: {{code}} {{status}}"
  • Read error - "Failed to read content: {{error}}"
  • Conversion error - "Failed to convert HTML to Markdown: {{error}}"

Usage Examples

Basic Web Fetch

URL: https://example.com/article
Prompt: Summarize the main points of this article

Extract Specific Information

URL: https://docs.example.com/api
Prompt: List all API endpoints mentioned in this documentation

With Custom User Agent

URL: https://website.com/data
Prompt: Extract the data table and convert to JSON
User Agent: MyBot/1.0

With Timeout

URL: https://slow-site.com
Prompt: Get the page title and description
Timeout: 60000

Through Proxy

URL: https://restricted-site.com
Prompt: Extract the main content
Use Robomotion Proxy: true

Content Conversion

The node converts HTML to markdown using these transformations:

HTML Elements to Markdown

  • <h1> - # - Headings
  • <p> - Text paragraphs
  • <a> - [text](url) - Links
  • <strong>, <b> - **text** - Bold
  • <em>, <i> - *text* - Italic
  • <code> - `code` - Inline code
  • <pre> - ```code``` - Code blocks
  • <ul>, <ol> - Bulleted/numbered lists
  • <table> - Markdown tables

Cleanup

  • Removes excessive newlines
  • Preserves code formatting
  • Maintains link references
  • Keeps table structure

Redirect Handling

When a URL redirects to a different host:

  • The node detects the redirect
  • Returns redirect information in Response output
  • Includes both original and redirect URLs
  • Does not fetch content from the new host
  • User can make a new request to the redirect URL if desired

Usage Notes

  • HTTP URLs are automatically upgraded to HTTPS
  • Content larger than Max Size is truncated
  • Output includes truncation indicator in metadata
  • Cookies are handled automatically when enabled
  • Redirects within same host are followed automatically
  • Cross-host redirects are reported but not followed
  • User-Agent can be customized for site compatibility
  • Markdown conversion removes scripts and styles
  • AI processing is simulated (placeholder in current implementation)

Common Use Cases

Content Extraction

Fetch articles, documentation, or web pages for processing.

Web Scraping

Extract structured data from websites.

Documentation Analysis

Analyze API documentation or technical guides.

Content Summarization

Summarize long articles or blog posts.

Data Collection

Collect information from multiple web sources.

Competitive Analysis

Monitor competitor websites for changes.

Best Practices

  • Set appropriate timeouts for different sites
  • Respect robots.txt and site policies
  • Use custom User-Agent to identify your bot
  • Handle redirects appropriately
  • Check status codes before processing content
  • Set reasonable max size to prevent memory issues
  • Use proxy when accessing restricted content
  • Craft specific prompts for better AI responses
  • Cache results to avoid repeated fetches
  • Rate limit requests to avoid overwhelming servers

Security Notes

  • URLs are validated before fetching
  • HTTPS is preferred over HTTP
  • Content size is limited to prevent DoS
  • Timeout prevents hanging requests
  • Cookies can be disabled if needed
  • Proxy can be used for additional security

Performance Tips

  • Reduce timeout for fast-failing on unreachable sites
  • Set Max Size based on expected content size
  • Disable Follow Redirects if not needed
  • Use proxy only when necessary
  • Cache fetched content when possible

AI Processing

The prompt parameter guides how the AI model processes the fetched content:

Prompt Examples

  • "Summarize this article in 3 bullet points"
  • "Extract all email addresses from this page"
  • "List all products with their prices"
  • "What are the main topics discussed?"
  • "Convert this content to a structured format"

Note: Current implementation provides a placeholder for AI processing. In production, this would integrate with an actual AI model.

Comparison with Other Nodes

  • Web Fetch vs Web Search: Web Fetch retrieves specific URL content; Web Search queries search engines
  • Web Fetch vs Read: Web Fetch retrieves web content; Read reads local files
  • Web Fetch vs Grep: Web Fetch gets web content; Grep searches local file contents