Skip to main content

Read Document

Reads the text content of a Google Docs document.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • Document Id - The ID of the Google Docs document to read.

Output

  • Text - The text content of the Google Docs document.

How It Works

The Read Document node integrates with Google Docs to extract the text content from a document. When executed, the node:

  1. Validates the provided input (Document Id)
  2. Connects to the specified Google Docs document
  3. Reads the entire text content of the document
  4. Returns the extracted text as output

Requirements

  • A valid Google Docs document ID
  • Valid Google Docs credentials
  • Valid Google Docs permissions to access the document

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid Document Id
  • Google Docs service errors
  • Insufficient permissions to access the document

Usage Notes

  • The Document Id can be found in the URL of the Google Docs document
  • The node extracts only the text content, not formatting or images
  • The extracted text will be returned as a single string in the Text output
  • This node is useful for processing the content of existing Google Docs documents
  • Complex formatting, tables, and images will not be included in the extracted text
  • The text extraction preserves the order of content as it appears in the document

Practical Examples

Example 1: Read Document Content

Extract all text from a document:

Inputs:

  • Document Id: $.document_id (from Open Document or Create Document)

Output:

  • Text: Stored in $.text - contains all text content from the document

Use Case: Extract document content for processing, analysis, or archiving

Example 2: Extract Data from Multiple Documents

Read content from multiple documents in a loop:

Workflow:

1. For Each row in document_list

2. Open Document (URL: $.row.url)

3. Read Document (Document Id: $.document_id)

4. Store text in results array

Use Case: Aggregate content from multiple documents into a single dataset

Example 3: Search Document Content

Read document and search for specific keywords:

Workflow:

1. Open Document

2. Read Document

3. JavaScript (search for keywords in $.text)

4. Store results if keywords found

Use Case: Find documents containing specific terms or phrases

Example 4: Document Content Analysis

Analyze document content for reporting:

Workflow:

1. Read Document

2. JavaScript (count words, characters, or analyze sentiment)

3. Store metrics in data table

Output Example:

  • Word count: $.text.split(' ').length
  • Character count: $.text.length

Example 5: Extract for Translation

Read document content for translation:

Workflow:

1. Open Document

2. Read Document

3. Translate Text (use translation service)

4. Create new document with translated text

Tips for Effective Use

  1. Always open first - Use Open Document before Read Document to establish connection
  2. Plain text only - Remember this extracts text without formatting, styles, or images
  3. Preserve original - Reading doesn't modify the document
  4. Process large documents - For very large documents, consider processing text in chunks
  5. Combine with search - Use JavaScript or regex to extract specific sections
  6. Cache results - Store read text in variables to avoid repeated reads
  7. Handle empty documents - Check if text output is empty before processing

Common Errors and Solutions

Error: "Document ID cannot be empty"

Cause: Document Id input is empty or undefined

Solution:

  • Ensure Open Document or Create Document was called first
  • Verify $.document_id variable exists and has a value
  • Check that previous nodes executed successfully

Error: "Document not found"

Cause: No document connection exists for the given ID

Solution:

  • Always call Open Document or Create Document before Read Document
  • Ensure document_id is passed correctly from previous node
  • Check that the document wasn't closed or connection lost

Error: "ErrInternal" - Failed to get document

Cause: Google API error or permission issue

Solution:

  • Verify account has read access to the document
  • Check credentials are valid and not expired
  • Ensure Google Docs API is enabled
  • Try re-opening the document

Error: Incomplete text extraction

Cause: Document contains special elements not extracted as text

Solution:

  • Be aware that tables, images, drawings aren't included in text output
  • Headers and footers are not included in body text extraction
  • Some special characters may not be preserved
  • For complete content, consider using Google Docs API directly

Integration Patterns

Pattern 1: Document Content Archival

Archive document content to database:

1. Get list of document URLs from source

2. For Each URL

3. Open Document

4. Read Document

5. Store text in database with metadata (date, author, etc.)

Pattern 2: Document Search and Index

Build searchable index of documents:

1. Read Document

2. Extract keywords and metadata

3. Store in search index (Elasticsearch, database, etc.)

4. Enable search functionality across documents

Pattern 3: Content Validation

Validate document content meets requirements:

1. Read Document

2. JavaScript validation (check for required sections, word count, etc.)

3. If validation fails, send notification

4. If validation passes, proceed with workflow

Pattern 4: Template Data Extraction

Extract data from template-filled documents:

1. Read Document

2. Parse text using regex or string manipulation

3. Extract structured data (names, dates, amounts, etc.)

4. Store extracted data in structured format

Text Output Format

The text output includes:

  • All paragraph text in document order
  • Line breaks between paragraphs
  • Text from the main body only (no headers/footers)
  • Plain text without any formatting information

Not included:

  • Text formatting (bold, italic, color, etc.)
  • Images or drawings
  • Tables (table text is included but structure is lost)
  • Headers and footers
  • Comments or suggestions
  • Document properties or metadata