Web Fetch
Fetches content from a URL, converts it to markdown, and processes it using an AI model.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
Inputs
- URL -
string- The URL to fetch content from (required, must be valid HTTP/HTTPS URL). - Prompt -
string- The prompt to run on the fetched content (required).
Options
- Timeout -
int- Request timeout in milliseconds (default: 30000, 30 seconds). - Max Size -
int- Maximum content size in bytes (default: 10485760, 10MB). - User Agent -
string- User agent string for requests (default: Mozilla/5.0 compatible). - Follow Redirects -
bool- Follow HTTP redirects (default: true). - Accept Cookies -
bool- Accept and send cookies (default: true). - Use Robomotion Proxy -
bool- Use Robomotion proxy for requests (default: false).
Outputs
- Content -
string- Fetched content converted to markdown format. - Response -
string- AI model's response about the content based on the prompt. - Final URL -
string- Final URL after any redirects. - Status Code -
int- HTTP status code from the request. - Metadata -
object- Response metadata:contentType- Content-Type header valuecontentLength- Size of fetched content in bytesheaders- HTTP response headersfetchTime- Timestamp when content was fetchedtruncated- Whether content was truncated due to size limit
How It Works
The Web Fetch node retrieves and processes web content. When executed, the node:
- Validates the URL format
- Upgrades HTTP URLs to HTTPS automatically
- Creates HTTP client with specified options:
- Sets timeout
- Configures redirect following
- Optionally uses proxy
- Sends GET request with appropriate headers:
- User-Agent
- Accept (HTML and XML)
- Accept-Language
- Checks for cross-domain redirects:
- If redirected to different host, returns redirect info
- Validates HTTP status code (200-299)
- Reads response body up to max size limit
- Converts HTML content to markdown format
- Processes content with AI model using the provided prompt
- Returns markdown content, AI response, and metadata
Requirements
- Valid HTTP or HTTPS URL
- Network connectivity
- Target server accessibility
- For proxied requests: Robomotion proxy configuration
Error Handling
The node will return specific errors in the following cases:
- Missing URL - "URL is required"
- Missing prompt - "Prompt is required"
- Invalid URL - "Invalid URL:
{{error}}" - Unsupported protocol - "Only HTTP(S) URLs are supported"
- Client creation failed - "Failed to create HTTP client:
{{error}}" - Request creation failed - "Failed to create request:
{{error}}" - Fetch failed - "Failed to fetch URL:
{{error}}" - HTTP error - "HTTP error:
{{code}}{{status}}" - Read error - "Failed to read content:
{{error}}" - Conversion error - "Failed to convert HTML to Markdown:
{{error}}"
Usage Examples
Basic Web Fetch
URL: https://example.com/article
Prompt: Summarize the main points of this article
Extract Specific Information
URL: https://docs.example.com/api
Prompt: List all API endpoints mentioned in this documentation
With Custom User Agent
URL: https://website.com/data
Prompt: Extract the data table and convert to JSON
User Agent: MyBot/1.0
With Timeout
URL: https://slow-site.com
Prompt: Get the page title and description
Timeout: 60000
Through Proxy
URL: https://restricted-site.com
Prompt: Extract the main content
Use Robomotion Proxy: true
Content Conversion
The node converts HTML to markdown using these transformations:
HTML Elements to Markdown
<h1>-#- Headings<p>- Text paragraphs<a>-[text](url)- Links<strong>,<b>-**text**- Bold<em>,<i>-*text*- Italic<code>-`code`- Inline code<pre>-```code```- Code blocks<ul>,<ol>- Bulleted/numbered lists<table>- Markdown tables
Cleanup
- Removes excessive newlines
- Preserves code formatting
- Maintains link references
- Keeps table structure
Redirect Handling
When a URL redirects to a different host:
- The node detects the redirect
- Returns redirect information in Response output
- Includes both original and redirect URLs
- Does not fetch content from the new host
- User can make a new request to the redirect URL if desired
Usage Notes
- HTTP URLs are automatically upgraded to HTTPS
- Content larger than Max Size is truncated
- Output includes truncation indicator in metadata
- Cookies are handled automatically when enabled
- Redirects within same host are followed automatically
- Cross-host redirects are reported but not followed
- User-Agent can be customized for site compatibility
- Markdown conversion removes scripts and styles
- AI processing is simulated (placeholder in current implementation)
Common Use Cases
Content Extraction
Fetch articles, documentation, or web pages for processing.
Web Scraping
Extract structured data from websites.
Documentation Analysis
Analyze API documentation or technical guides.
Content Summarization
Summarize long articles or blog posts.
Data Collection
Collect information from multiple web sources.
Competitive Analysis
Monitor competitor websites for changes.
Best Practices
- Set appropriate timeouts for different sites
- Respect robots.txt and site policies
- Use custom User-Agent to identify your bot
- Handle redirects appropriately
- Check status codes before processing content
- Set reasonable max size to prevent memory issues
- Use proxy when accessing restricted content
- Craft specific prompts for better AI responses
- Cache results to avoid repeated fetches
- Rate limit requests to avoid overwhelming servers
Security Notes
- URLs are validated before fetching
- HTTPS is preferred over HTTP
- Content size is limited to prevent DoS
- Timeout prevents hanging requests
- Cookies can be disabled if needed
- Proxy can be used for additional security
Performance Tips
- Reduce timeout for fast-failing on unreachable sites
- Set Max Size based on expected content size
- Disable Follow Redirects if not needed
- Use proxy only when necessary
- Cache fetched content when possible
AI Processing
The prompt parameter guides how the AI model processes the fetched content:
Prompt Examples
- "Summarize this article in 3 bullet points"
- "Extract all email addresses from this page"
- "List all products with their prices"
- "What are the main topics discussed?"
- "Convert this content to a structured format"
Note: Current implementation provides a placeholder for AI processing. In production, this would integrate with an actual AI model.
Comparison with Other Nodes
- Web Fetch vs Web Search: Web Fetch retrieves specific URL content; Web Search queries search engines
- Web Fetch vs Read: Web Fetch retrieves web content; Read reads local files
- Web Fetch vs Grep: Web Fetch gets web content; Grep searches local file contents