Skip to main content

Extract Text

Extracts text from documents using Google Document AI.

Common Properties

  • Name - The custom name of the node.
  • Color - The custom color of the node.
  • Delay Before (sec) - Waits in seconds before executing the node.
  • Delay After (sec) - Waits in seconds after executing node.
  • Continue On Error - Automation will continue regardless of any error. The default value is false.
info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

  • File Path - The local file path of the document to process.
  • MIME Type - The MIME type of the document. If not provided, it will be auto-detected.

Output

  • Text - The full extracted text from the document.
  • Pages - An array of pages, each containing extracted text.

Options

  • Credentials - Google Document AI credentials used to authenticate with the service.
  • Project Id - The Google Cloud project ID associated with your Document AI processor.
  • Location - The location of the Document AI processor. Default is "us".
  • Processor Id - The ID of the Document AI processor to use for text extraction.

How It Works

The Extract Text node integrates with Google Document AI to extract text from documents. When executed, the node:

  1. Validates the provided inputs (File Path, MIME Type)
  2. Authenticates with Google Document AI using the provided credentials
  3. Reads the specified document file
  4. Processes the document using the specified Document AI processor
  5. Extracts text from the document
  6. Returns both the full text and page-organized text as output

Requirements

  • Valid Google Document AI credentials
  • A valid Google Cloud project ID
  • A configured Document AI processor for text extraction
  • A valid local file path to a document
  • Valid processor ID and location

Error Handling

The node will return specific errors in the following cases:

  • Empty or invalid File Path
  • Empty or invalid Project ID
  • Empty or invalid Processor ID
  • Google Document AI service errors
  • Unable to read the specified file
  • Invalid credentials

Usage Notes

  • The File Path should point to a local document file (PDF, images, etc.)
  • If MIME Type is not provided, it will be auto-detected from the file content
  • The Project Id, Location, and Processor Id are required for Document AI processing
  • The output includes both the full extracted text and text organized by pages
  • This node is useful for converting documents to plain text for further processing
  • The Location option specifies the geographic location of your Document AI processor (e.g., "us", "eu")
  • The extracted text preserves the reading order of the original document