Read Document
Reads the text content of a Google Docs document.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.
Inputs
- Document Id - The ID of the Google Docs document to read.
Output
- Text - The text content of the Google Docs document.
How It Works
The Read Document node integrates with Google Docs to extract the text content from a document. When executed, the node:
- Validates the provided input (Document Id)
- Connects to the specified Google Docs document
- Reads the entire text content of the document
- Returns the extracted text as output
Requirements
- A valid Google Docs document ID
- Valid Google Docs credentials
- Valid Google Docs permissions to access the document
Error Handling
The node will return specific errors in the following cases:
- Empty or invalid Document Id
- Google Docs service errors
- Insufficient permissions to access the document
Usage Notes
- The Document Id can be found in the URL of the Google Docs document
- The node extracts only the text content, not formatting or images
- The extracted text will be returned as a single string in the Text output
- This node is useful for processing the content of existing Google Docs documents
- Complex formatting, tables, and images will not be included in the extracted text
- The text extraction preserves the order of content as it appears in the document
Practical Examples
Example 1: Read Document Content
Extract all text from a document:
Inputs:
- Document Id:
$.document_id(from Open Document or Create Document)
Output:
- Text: Stored in
$.text- contains all text content from the document
Use Case: Extract document content for processing, analysis, or archiving
Example 2: Extract Data from Multiple Documents
Read content from multiple documents in a loop:
Workflow:
1. For Each row in document_list
↓
2. Open Document (URL: $.row.url)
↓
3. Read Document (Document Id: $.document_id)
↓
4. Store text in results array
Use Case: Aggregate content from multiple documents into a single dataset
Example 3: Search Document Content
Read document and search for specific keywords:
Workflow:
1. Open Document
↓
2. Read Document
↓
3. JavaScript (search for keywords in $.text)
↓
4. Store results if keywords found
Use Case: Find documents containing specific terms or phrases
Example 4: Document Content Analysis
Analyze document content for reporting:
Workflow:
1. Read Document
↓
2. JavaScript (count words, characters, or analyze sentiment)
↓
3. Store metrics in data table
Output Example:
- Word count:
$.text.split(' ').length - Character count:
$.text.length
Example 5: Extract for Translation
Read document content for translation:
Workflow:
1. Open Document
↓
2. Read Document
↓
3. Translate Text (use translation service)
↓
4. Create new document with translated text
Tips for Effective Use
- Always open first - Use Open Document before Read Document to establish connection
- Plain text only - Remember this extracts text without formatting, styles, or images
- Preserve original - Reading doesn't modify the document
- Process large documents - For very large documents, consider processing text in chunks
- Combine with search - Use JavaScript or regex to extract specific sections
- Cache results - Store read text in variables to avoid repeated reads
- Handle empty documents - Check if text output is empty before processing
Common Errors and Solutions
Error: "Document ID cannot be empty"
Cause: Document Id input is empty or undefined
Solution:
- Ensure Open Document or Create Document was called first
- Verify
$.document_idvariable exists and has a value - Check that previous nodes executed successfully
Error: "Document not found"
Cause: No document connection exists for the given ID
Solution:
- Always call Open Document or Create Document before Read Document
- Ensure document_id is passed correctly from previous node
- Check that the document wasn't closed or connection lost
Error: "ErrInternal" - Failed to get document
Cause: Google API error or permission issue
Solution:
- Verify account has read access to the document
- Check credentials are valid and not expired
- Ensure Google Docs API is enabled
- Try re-opening the document
Error: Incomplete text extraction
Cause: Document contains special elements not extracted as text
Solution:
- Be aware that tables, images, drawings aren't included in text output
- Headers and footers are not included in body text extraction
- Some special characters may not be preserved
- For complete content, consider using Google Docs API directly
Integration Patterns
Pattern 1: Document Content Archival
Archive document content to database:
1. Get list of document URLs from source
↓
2. For Each URL
↓
3. Open Document
↓
4. Read Document
↓
5. Store text in database with metadata (date, author, etc.)
Pattern 2: Document Search and Index
Build searchable index of documents:
1. Read Document
↓
2. Extract keywords and metadata
↓
3. Store in search index (Elasticsearch, database, etc.)
↓
4. Enable search functionality across documents
Pattern 3: Content Validation
Validate document content meets requirements:
1. Read Document
↓
2. JavaScript validation (check for required sections, word count, etc.)
↓
3. If validation fails, send notification
↓
4. If validation passes, proceed with workflow
Pattern 4: Template Data Extraction
Extract data from template-filled documents:
1. Read Document
↓
2. Parse text using regex or string manipulation
↓
3. Extract structured data (names, dates, amounts, etc.)
↓
4. Store extracted data in structured format
Text Output Format
The text output includes:
- All paragraph text in document order
- Line breaks between paragraphs
- Text from the main body only (no headers/footers)
- Plain text without any formatting information
Not included:
- Text formatting (bold, italic, color, etc.)
- Images or drawings
- Tables (table text is included but structure is lost)
- Headers and footers
- Comments or suggestions
- Document properties or metadata