Skip to main content

Google Gemini

Google Gemini is Google's most capable and advanced family of large language models (LLMs), offering multimodal capabilities for text generation, image understanding, video generation, audio processing, and advanced reasoning tasks.

Overview

The Robomotion Google Gemini package provides comprehensive integration with Google's Gemini API, enabling you to:

  • Generate high-quality text content with advanced reasoning
  • Conduct multi-turn chat conversations with context retention
  • Process and understand images, documents, and multimedia files
  • Generate images with Imagen models (Nano Banana)
  • Create videos with Veo models (with native audio support)
  • Generate semantic embeddings for similarity search and classification
  • Edit images using mask-based inpainting
  • Access Google Search and code execution capabilities

Key Features

Text Generation

  • Multiple Models: Choose from Gemini 3 Pro, Gemini 2.5 Pro/Flash, and Gemini 2.0 Flash variants
  • Thinking Mode: Control reasoning depth with dynamic, budget-based, or level-based thinking
  • Multimodal Input: Include images, documents, audio, and video files alongside text prompts
  • Structured Output: JSON mode with schema validation for reliable data extraction
  • Tools: Enable Google Search grounding and code execution
  • Safety Controls: Granular content filtering across multiple categories

Chat Conversations

  • Stateful Conversations: Maintain context across multiple messages
  • History Management: Full control over conversation history
  • File Attachments: Include multimedia files in chat messages
  • Streaming Support: Real-time response generation

File Management

  • Upload: Upload large files (up to 2GB) for use in prompts
  • List & Retrieve: Manage your uploaded files with pagination
  • Metadata: Track file state, expiration, and processing status
  • Automatic Cleanup: Files expire automatically after 48 hours

Image Generation

  • Multiple Models: Nano Banana (2.5 Flash) and Nano Banana Pro (3 Pro Preview)
  • Aspect Ratios: Support for 1:1, 16:9, 9:16, 4:3, 3:4, and more
  • Reference Images: Use up to 14 reference images for style consistency
  • Google Search: Ground image generation with web search results
  • Multiple Outputs: Generate up to 4 variations in a single request

Video Generation

  • Veo Models: Veo 3.1 (with audio), Veo 3.1 Fast, and Veo 3.0
  • Native Audio: Automatic audio generation for Veo 3+ models
  • Image-to-Video: Transform still images into dynamic videos
  • Frame Control: Specify first and last frames for precise control
  • Flexible Duration: 4, 6, or 8 second videos
  • HD Quality: 720p or 1080p resolution options

Embeddings

  • Multiple Task Types: Optimized for retrieval, classification, clustering, and more
  • Batch Processing: Generate embeddings for multiple texts efficiently
  • Semantic Comparison: Built-in similarity calculation with multiple metrics
  • Dimension Reduction: Custom output dimensionality (256, 512, 768)
  • File-Based Workflows: Load/save embeddings for offline processing

Authentication Options

The package supports two authentication methods:

  1. Direct API Key: Use your own Google AI Studio API key
  2. Robomotion AI Credits: Pay-per-use billing through Robomotion's managed service (no API key required)

Getting Started

  1. Connect: Establish a connection using your API key or Robomotion credits
  2. Generate: Use any generation node (text, image, video) with your prompts
  3. Process Results: Access generated content through output variables
  4. Disconnect: Close the connection when done (optional, automatic cleanup on flow end)

Common Use Cases

Document Processing

  • Extract structured data from invoices, receipts, and forms
  • Summarize long documents and reports
  • Translate documents while preserving formatting
  • Answer questions about uploaded PDFs and images

Content Creation

  • Generate marketing copy and product descriptions
  • Create social media posts and captions
  • Write code documentation and technical guides
  • Produce image assets for presentations and marketing

Data Analysis

  • Classify and categorize text data
  • Perform sentiment analysis on customer feedback
  • Find similar items using semantic embeddings
  • Generate insights from business data

Automation

  • Build AI-powered chatbots and assistants
  • Automate customer support responses
  • Process and route incoming communications
  • Generate reports and summaries on schedule

Model Selection Guide

Text Generation

  • Gemini 3 Pro: Best quality, advanced reasoning, highest cost
  • Gemini 2.5 Pro: Balanced performance and cost
  • Gemini 2.5 Flash: Fast, cost-effective for most tasks
  • Gemini 2.5 Flash Lite: Ultra-fast, lowest cost for simple tasks
  • Gemini 2.0 Flash: Legacy model with good performance

Image Generation

  • Nano Banana Pro (3 Pro Preview): Best quality, supports up to 14 reference images
  • Nano Banana (2.5 Flash): Fast, good quality for most use cases

Video Generation

  • Veo 3.1: Best quality with native audio support
  • Veo 3.1 Fast: Faster generation, good quality
  • Veo 3.0: Legacy model with audio support

Embeddings

  • text-embedding-004: Latest model, best performance
  • text-embedding-005: Alternative option
  • text-multilingual-embedding-002: For multilingual content

Available Nodes