Pdf To Text

Extracts text from PDF documents using Google Vision API's document text detection feature.

Common Properties

Name - The custom name of the node.
Color - The custom color of the node.
Delay Before (sec) - Waits in seconds before executing the node.
Delay After (sec) - Waits in seconds after executing node.
Continue On Error - Automation will continue regardless of any error. The default value is false.

info

If the ContinueOnError property is true, no error is caught when the project is executed, even if a Catch node is used.

Inputs

Vision Client Id - The unique identifier of the Vision API connection, typically obtained from the Connect node.
GCS Source URI - The Google Cloud Storage URI of the PDF file to process (e.g., gs://bucket-name/file.pdf).
GCS Destination URI - The Google Cloud Storage URI where the output JSON files will be stored (e.g., gs://bucket-name/output/).

Options

No additional options available for this node.

Output

Converted Path - The path to the output JSON file containing the extracted text.

How It Works

The Pdf To Text node uses Google Vision API to extract text from PDF documents and save the results as JSON files in Google Cloud Storage. When executed, the node:

Retrieves the Vision API client using the provided client ID
Validates that both source and destination URIs are not empty
Calls the AsyncBatchAnnotateFiles method to initiate asynchronous text extraction
Waits for the operation to complete
Returns the path to the output JSON file containing the extracted text

Requirements

A valid connection to Vision API established with the Connect node
Valid Google Cloud credentials with appropriate permissions
A PDF file stored in Google Cloud Storage
Enabled Vision API in your Google Cloud project
Proper permissions to read from the source GCS bucket and write to the destination GCS bucket

Error Handling

The node will return specific errors in the following cases:

Empty or invalid Vision Client ID
Empty GCS Source URI
Empty GCS Destination URI
Invalid GCS URIs
Network connectivity issues
Vision API service errors
Authentication failures
Insufficient permissions to access GCS buckets

Usage Notes

The Vision Client ID must be obtained from a successful Connect node execution
Both source and destination must be valid Google Cloud Storage URIs
The source PDF file must be accessible from the specified GCS URI
The destination GCS URI should end with a forward slash (/)
This is an asynchronous operation that may take some time to complete depending on the PDF size
The output is saved as JSON files in the specified destination GCS bucket
The node processes PDF files with support for multiple pages
The output JSON contains structured text data with positional information
This node is useful for processing large documents and extracting searchable text
The operation uses Google Cloud Storage for both input and output

Pdf To Text

Common Properties​

Inputs​

Options​

Output​

How It Works​

Requirements​

Error Handling​

Usage Notes​