Read Word
Reads the content from a Word document, including text from each page and optionally headers and footers. This node extracts all readable content from the document for further processing.
Common Properties
- Name - The custom name of the node.
- Color - The custom color of the node.
- Delay Before (sec) - Waits in seconds before executing the node.
- Delay After (sec) - Waits in seconds after executing the node.
- Continue On Error - Automation will continue regardless of any error. The default value is false.
If ContinueOnError property is true, no error is caught when the project is executed even if Catch node is used.
Input
- File Descriptor - File descriptor ID from the Open Word node. Default variable is
word_fd.
Output
-
Result - Array of strings, where each element represents the text content of a page in the document. Default variable name is
result. -
Header and Footer - Array of objects containing header and footer text for each section of the document. Default variable name is
header_footer. Each object has the structure:{
"header": "Header text content",
"footer": "Footer text content"
}
Options
- Get Header and Footer - Whether to extract headers and footers from the document sections. Default is
false.true- Extracts both page content and header/footer informationfalse- Extracts only page content
Use Cases
- Extracting text content from Word documents for data processing
- Reading template placeholders before replacement
- Analyzing document structure and content
- Archiving document text to databases
- Converting Word content to other formats
- Validating document content in automated workflows
Example
Reading a multi-page Word document:
- Use Open Word to open the document
- Add the Read Word node
- Set File Descriptor to
word_fd - Enable Get Header and Footer if needed
- Access the content from the
resultvariable
Example output structure:
// result array (page content)
[
"This is the content of page 1...",
"This is the content of page 2...",
"This is the content of page 3..."
]
// header_footer array (when Get Header and Footer is enabled)
[
{
"header": "Company Report 2024",
"footer": "Page 1 - Confidential"
},
{
"header": "Company Report 2024",
"footer": "Page 2 - Confidential"
}
]
Processing the Results
// Iterate through pages
for (let i = 0; i < result.length; i++) {
console.log("Page " + (i + 1) + ": " + result[i]);
}
// Access specific page
let firstPage = result[0];
// Search for text across all pages
let searchText = "invoice";
let found = result.some(page => page.toLowerCase().includes(searchText));
The text content includes all visible text but preserves minimal formatting. Use this node to extract raw text data from documents.
Common Errors
- "Invalid File Descriptor" - Ensure the Open Word node was executed first and the file descriptor variable is correct.
- Empty result - The document may be empty or contain only images/objects without text.
Notes
- Each page is returned as a separate array element in the
resultoutput. - Headers and footers are organized by document sections, not pages.
- Special characters and formatting may be preserved as plain text.
- The node reads from the currently active document associated with the file descriptor.
See Also
- Open Word - Open a document before reading
- Replace Text - Replace text in the document
- Add Text - Add new text to the document