Web Fetch

Fetches content from a URL, converts it to markdown, and processes it using an AI model.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

Inputs

URL - string - The URL to fetch content from (required, must be valid HTTP/HTTPS URL).
Prompt - string - The prompt to run on the fetched content (required).

Options

Timeout - int - Request timeout in milliseconds (default: 30000, 30 seconds).
Max Size - int - Maximum content size in bytes (default: 10485760, 10MB).
User Agent - string - User agent string for requests (default: Mozilla/5.0 compatible).
Follow Redirects - bool - Follow HTTP redirects (default: true).
Accept Cookies - bool - Accept and send cookies (default: true).
Use Robomotion Proxy - bool - Use Robomotion proxy for requests (default: false).

Outputs

Content - string - Fetched content converted to markdown format.
Response - string - AI model's response about the content based on the prompt.
Final URL - string - Final URL after any redirects.
Status Code - int - HTTP status code from the request.
Metadata - object - Response metadata:
- contentType - Content-Type header value
- contentLength - Size of fetched content in bytes
- headers - HTTP response headers
- fetchTime - Timestamp when content was fetched
- truncated - Whether content was truncated due to size limit

How It Works

The Web Fetch node retrieves and processes web content. When executed, the node:

Validates the URL format
Upgrades HTTP URLs to HTTPS automatically
Creates HTTP client with specified options:
- Sets timeout
- Configures redirect following
- Optionally uses proxy
Sends GET request with appropriate headers:
- User-Agent
- Accept (HTML and XML)
- Accept-Language
Checks for cross-domain redirects:
- If redirected to different host, returns redirect info
Validates HTTP status code (200-299)
Reads response body up to max size limit
Converts HTML content to markdown format
Processes content with AI model using the provided prompt
Returns markdown content, AI response, and metadata

Requirements

Valid HTTP or HTTPS URL
Network connectivity
Target server accessibility
For proxied requests: Robomotion proxy configuration

Error Handling

The node will return specific errors in the following cases:

Missing URL - "URL is required"
Missing prompt - "Prompt is required"
Invalid URL - "Invalid URL: {{error}}"
Unsupported protocol - "Only HTTP(S) URLs are supported"
Client creation failed - "Failed to create HTTP client: {{error}}"
Request creation failed - "Failed to create request: {{error}}"
Fetch failed - "Failed to fetch URL: {{error}}"
HTTP error - "HTTP error: {{code}} {{status}}"
Read error - "Failed to read content: {{error}}"
Conversion error - "Failed to convert HTML to Markdown: {{error}}"

Usage Examples

Basic Web Fetch

URL: https://example.com/article
Prompt: Summarize the main points of this article

Extract Specific Information

URL: https://docs.example.com/api
Prompt: List all API endpoints mentioned in this documentation

With Custom User Agent

URL: https://website.com/data
Prompt: Extract the data table and convert to JSON
User Agent: MyBot/1.0

With Timeout

URL: https://slow-site.com
Prompt: Get the page title and description
Timeout: 60000

Through Proxy

URL: https://restricted-site.com
Prompt: Extract the main content
Use Robomotion Proxy: true

Content Conversion

The node converts HTML to markdown using these transformations:

HTML Elements to Markdown

<h1> - # - Headings
<p> - Text paragraphs
<a> - [text](url) - Links
<strong>, <b> - **text** - Bold
<em>, <i> - *text* - Italic
<code> - `code` - Inline code
<pre> - ```code``` - Code blocks
<ul>, <ol> - Bulleted/numbered lists
<table> - Markdown tables

Cleanup

Removes excessive newlines
Preserves code formatting
Maintains link references
Keeps table structure

Redirect Handling

When a URL redirects to a different host:

The node detects the redirect
Returns redirect information in Response output
Includes both original and redirect URLs
Does not fetch content from the new host
User can make a new request to the redirect URL if desired

Usage Notes

HTTP URLs are automatically upgraded to HTTPS
Content larger than Max Size is truncated
Output includes truncation indicator in metadata
Cookies are handled automatically when enabled
Redirects within same host are followed automatically
Cross-host redirects are reported but not followed
User-Agent can be customized for site compatibility
Markdown conversion removes scripts and styles
AI processing is simulated (placeholder in current implementation)

Common Use Cases

Content Extraction

Fetch articles, documentation, or web pages for processing.

Web Scraping

Extract structured data from websites.

Documentation Analysis

Analyze API documentation or technical guides.

Content Summarization

Summarize long articles or blog posts.

Data Collection

Collect information from multiple web sources.

Competitive Analysis

Monitor competitor websites for changes.

Best Practices

Set appropriate timeouts for different sites
Respect robots.txt and site policies
Use custom User-Agent to identify your bot
Handle redirects appropriately
Check status codes before processing content
Set reasonable max size to prevent memory issues
Use proxy when accessing restricted content
Craft specific prompts for better AI responses
Cache results to avoid repeated fetches
Rate limit requests to avoid overwhelming servers

Security Notes

URLs are validated before fetching
HTTPS is preferred over HTTP
Content size is limited to prevent DoS
Timeout prevents hanging requests
Cookies can be disabled if needed
Proxy can be used for additional security

Performance Tips

Reduce timeout for fast-failing on unreachable sites
Set Max Size based on expected content size
Disable Follow Redirects if not needed
Use proxy only when necessary
Cache fetched content when possible

AI Processing

The prompt parameter guides how the AI model processes the fetched content:

Prompt Examples

"Summarize this article in 3 bullet points"
"Extract all email addresses from this page"
"List all products with their prices"
"What are the main topics discussed?"
"Convert this content to a structured format"

Note: Current implementation provides a placeholder for AI processing. In production, this would integrate with an actual AI model.

Comparison with Other Nodes

Web Fetch vs Web Search: Web Fetch retrieves specific URL content; Web Search queries search engines
Web Fetch vs Read: Web Fetch retrieves web content; Read reads local files
Web Fetch vs Grep: Web Fetch gets web content; Grep searches local file contents

Common Properties​

Inputs​

Options​

Outputs​

How It Works​

Requirements​

Error Handling​

Usage Examples​

Basic Web Fetch​

Extract Specific Information​

With Custom User Agent​

With Timeout​

Through Proxy​

Content Conversion​

HTML Elements to Markdown​

Cleanup​

Redirect Handling​

Usage Notes​

Common Use Cases​

Content Extraction​

Web Scraping​

Documentation Analysis​

Content Summarization​

Data Collection​

Competitive Analysis​

Best Practices​

Security Notes​

Performance Tips​

AI Processing​

Prompt Examples​

Comparison with Other Nodes​