Google Vision
Google Cloud Vision API provides powerful image analysis capabilities using machine learning, enabling you to detect objects, extract text, analyze content safety, and understand images at scale.
Overview
The Robomotion Google Vision package provides comprehensive integration with Google Cloud Vision API, enabling you to:
- Extract text from images using OCR (Optical Character Recognition)
- Extract text from PDF documents stored in Google Cloud Storage
- Detect and extract labels describing image content
- Analyze images for potentially unsafe content
- Process images for automation workflows
Key Features
Text Extraction (OCR)
- Image to Text: Extract printed and handwritten text from images
- PDF to Text: Extract text from multi-page PDF documents
- Multi-language Support: Recognize text in multiple languages
- High Accuracy: Industry-leading OCR accuracy
- Confidence Scores: Per-page confidence metrics
Image Understanding
- Label Detection: Automatically identify objects, locations, activities, and concepts
- Safe Search Detection: Detect adult, violent, medical, racy, and spoof content
- Batch Processing: Analyze multiple images efficiently
Document Processing
- PDF Processing: Extract structured text from PDF documents
- Cloud Storage Integration: Work directly with files in Google Cloud Storage
- Asynchronous Processing: Handle large documents efficiently
Authentication Options
The package supports two authentication methods:
- Connect Node + Credentials: Establish a persistent connection using Google Cloud service account credentials
- Direct Credentials: Provide credentials directly to each node (useful for one-off operations)
Getting Started
Basic Workflow
- Connect: Establish a connection using the Connect node with your Google Cloud credentials
- Analyze: Use any Vision node (Image to Text, Extract Labels, etc.) with the connection ID
- Process Results: Access extracted data through output variables
Alternative Workflow (Direct Credentials)
- Configure Node: Add credentials directly to the Vision node (ImageToText, ExtractImageLabels, etc.)
- Process: Run the node without a separate Connect node
- Get Results: Access output variables
Common Use Cases
Document Automation
- Digitize scanned invoices and receipts
- Extract data from forms and applications
- Process insurance claims documents
- Archive physical documents as searchable text
Content Moderation
- Filter user-uploaded images for inappropriate content
- Automate content review workflows
- Flag potentially unsafe images for manual review
- Classify content by safety categories
Image Organization
- Auto-tag photos with descriptive labels
- Categorize product images
- Build searchable image databases
- Generate image metadata automatically
Data Entry Automation
- Extract text from business cards
- Process handwritten forms
- Digitize paper records
- Import data from image-based documents
Requirements
- Google Cloud Platform account
- Vision API enabled in your GCP project
- Service account with appropriate permissions:
roles/cloudvision.userorroles/cloudvision.admin
- For PDF processing: Google Cloud Storage bucket with read/write access
Supported Image Formats
- JPEG
- PNG
- GIF
- BMP
- TIFF
- WebP
- RAW
Best Practices
Image Quality
- Use high-resolution images for better OCR accuracy
- Ensure good lighting and contrast
- Avoid blurry or distorted images
- Keep text orientation upright when possible
Performance
- Use the Connect node to reuse connections across multiple operations
- Batch similar operations together
- Consider image file sizes for faster processing
- Use Google Cloud Storage for large PDF files
Error Handling
- Enable "Continue On Error" for processing multiple images
- Validate image paths before processing
- Handle "No text found" scenarios gracefully
- Monitor API quotas and limits
Cost Optimization
- Process only necessary images
- Use appropriate image resolution (higher isn't always better)
- Cache results when possible
- Monitor API usage in Google Cloud Console
Available Nodes
📄️ Connect
Robomotion.GoogleVision.Connect
📄️ Extract Image Labels
Robomotion.GoogleVision.ExtractImageLabels
📄️ Check Image Safety
Robomotion.GoogleVision.CheckImageSafety
📄️ Image To Text
Robomotion.GoogleVision.ImageToText
📄️ Pdf To Text
Robomotion.GoogleVision.PdfToText