Skip to main content

Google Vision

Google Cloud Vision API provides powerful image analysis capabilities using machine learning, enabling you to detect objects, extract text, analyze content safety, and understand images at scale.

Overview

The Robomotion Google Vision package provides comprehensive integration with Google Cloud Vision API, enabling you to:

  • Extract text from images using OCR (Optical Character Recognition)
  • Extract text from PDF documents stored in Google Cloud Storage
  • Detect and extract labels describing image content
  • Analyze images for potentially unsafe content
  • Process images for automation workflows

Key Features

Text Extraction (OCR)

  • Image to Text: Extract printed and handwritten text from images
  • PDF to Text: Extract text from multi-page PDF documents
  • Multi-language Support: Recognize text in multiple languages
  • High Accuracy: Industry-leading OCR accuracy
  • Confidence Scores: Per-page confidence metrics

Image Understanding

  • Label Detection: Automatically identify objects, locations, activities, and concepts
  • Safe Search Detection: Detect adult, violent, medical, racy, and spoof content
  • Batch Processing: Analyze multiple images efficiently

Document Processing

  • PDF Processing: Extract structured text from PDF documents
  • Cloud Storage Integration: Work directly with files in Google Cloud Storage
  • Asynchronous Processing: Handle large documents efficiently

Authentication Options

The package supports two authentication methods:

  1. Connect Node + Credentials: Establish a persistent connection using Google Cloud service account credentials
  2. Direct Credentials: Provide credentials directly to each node (useful for one-off operations)

Getting Started

Basic Workflow

  1. Connect: Establish a connection using the Connect node with your Google Cloud credentials
  2. Analyze: Use any Vision node (Image to Text, Extract Labels, etc.) with the connection ID
  3. Process Results: Access extracted data through output variables

Alternative Workflow (Direct Credentials)

  1. Configure Node: Add credentials directly to the Vision node (ImageToText, ExtractImageLabels, etc.)
  2. Process: Run the node without a separate Connect node
  3. Get Results: Access output variables

Common Use Cases

Document Automation

  • Digitize scanned invoices and receipts
  • Extract data from forms and applications
  • Process insurance claims documents
  • Archive physical documents as searchable text

Content Moderation

  • Filter user-uploaded images for inappropriate content
  • Automate content review workflows
  • Flag potentially unsafe images for manual review
  • Classify content by safety categories

Image Organization

  • Auto-tag photos with descriptive labels
  • Categorize product images
  • Build searchable image databases
  • Generate image metadata automatically

Data Entry Automation

  • Extract text from business cards
  • Process handwritten forms
  • Digitize paper records
  • Import data from image-based documents

Requirements

  • Google Cloud Platform account
  • Vision API enabled in your GCP project
  • Service account with appropriate permissions:
    • roles/cloudvision.user or roles/cloudvision.admin
  • For PDF processing: Google Cloud Storage bucket with read/write access

Supported Image Formats

  • JPEG
  • PNG
  • GIF
  • BMP
  • TIFF
  • WebP
  • RAW

Best Practices

Image Quality

  • Use high-resolution images for better OCR accuracy
  • Ensure good lighting and contrast
  • Avoid blurry or distorted images
  • Keep text orientation upright when possible

Performance

  • Use the Connect node to reuse connections across multiple operations
  • Batch similar operations together
  • Consider image file sizes for faster processing
  • Use Google Cloud Storage for large PDF files

Error Handling

  • Enable "Continue On Error" for processing multiple images
  • Validate image paths before processing
  • Handle "No text found" scenarios gracefully
  • Monitor API quotas and limits

Cost Optimization

  • Process only necessary images
  • Use appropriate image resolution (higher isn't always better)
  • Cache results when possible
  • Monitor API usage in Google Cloud Console

Available Nodes