Skip to main content

Read Word

Reads the content from a Word document, including text from each page and optionally headers and footers. This node extracts all readable content from the document for further processing.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing the node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If ContinueOnError property is true, no error is caught when the project is executed even if Catch node is used.

Input

  • File Descriptor - File descriptor ID from the Open Word node. Default variable is word_fd.

Output

  • Result - Array of strings, where each element represents the text content of a page in the document. Default variable name is result.

  • Header and Footer - Array of objects containing header and footer text for each section of the document. Default variable name is header_footer. Each object has the structure:

    {
    "header": "Header text content",
    "footer": "Footer text content"
    }

Options

  • Get Header and Footer - Whether to extract headers and footers from the document sections. Default is false.
    • true - Extracts both page content and header/footer information
    • false - Extracts only page content

Use Cases

  • Extracting text content from Word documents for data processing
  • Reading template placeholders before replacement
  • Analyzing document structure and content
  • Archiving document text to databases
  • Converting Word content to other formats
  • Validating document content in automated workflows

Example

Reading a multi-page Word document:

  1. Use Open Word to open the document
  2. Add the Read Word node
  3. Set File Descriptor to word_fd
  4. Enable Get Header and Footer if needed
  5. Access the content from the result variable

Example output structure:

// result array (page content)
[
"This is the content of page 1...",
"This is the content of page 2...",
"This is the content of page 3..."
]

// header_footer array (when Get Header and Footer is enabled)
[
{
"header": "Company Report 2024",
"footer": "Page 1 - Confidential"
},
{
"header": "Company Report 2024",
"footer": "Page 2 - Confidential"
}
]

Processing the Results

// Iterate through pages
for (let i = 0; i < result.length; i++) {
console.log("Page " + (i + 1) + ": " + result[i]);
}

// Access specific page
let firstPage = result[0];

// Search for text across all pages
let searchText = "invoice";
let found = result.some(page => page.toLowerCase().includes(searchText));
tip

The text content includes all visible text but preserves minimal formatting. Use this node to extract raw text data from documents.

Common Errors

  • "Invalid File Descriptor" - Ensure the Open Word node was executed first and the file descriptor variable is correct.
  • Empty result - The document may be empty or contain only images/objects without text.

Notes

  • Each page is returned as a separate array element in the result output.
  • Headers and footers are organized by document sections, not pages.
  • Special characters and formatting may be preserved as plain text.
  • The node reads from the currently active document associated with the file descriptor.

See Also