Google Gemini

Google Gemini is Google's most capable and advanced family of large language models (LLMs), offering multimodal capabilities for text generation, image understanding, video generation, audio processing, and advanced reasoning tasks.

Overview

The Robomotion Google Gemini package provides comprehensive integration with Google's Gemini API, enabling you to:

Generate high-quality text content with advanced reasoning
Conduct multi-turn chat conversations with context retention
Process and understand images, documents, and multimedia files
Generate images with Imagen models (Nano Banana)
Create videos with Veo models (with native audio support)
Generate semantic embeddings for similarity search and classification
Edit images using mask-based inpainting
Access Google Search and code execution capabilities

Key Features

Text Generation

Multiple Models: Choose from Gemini 3 Pro, Gemini 2.5 Pro/Flash, and Gemini 2.0 Flash variants
Thinking Mode: Control reasoning depth with dynamic, budget-based, or level-based thinking
Multimodal Input: Include images, documents, audio, and video files alongside text prompts
Structured Output: JSON mode with schema validation for reliable data extraction
Tools: Enable Google Search grounding and code execution
Safety Controls: Granular content filtering across multiple categories

Chat Conversations

Stateful Conversations: Maintain context across multiple messages
History Management: Full control over conversation history
File Attachments: Include multimedia files in chat messages
Streaming Support: Real-time response generation

File Management

Upload: Upload large files (up to 2GB) for use in prompts
List & Retrieve: Manage your uploaded files with pagination
Metadata: Track file state, expiration, and processing status
Automatic Cleanup: Files expire automatically after 48 hours

Image Generation

Multiple Models: Nano Banana (2.5 Flash) and Nano Banana Pro (3 Pro Preview)
Aspect Ratios: Support for 1:1, 16:9, 9:16, 4:3, 3:4, and more
Reference Images: Use up to 14 reference images for style consistency
Google Search: Ground image generation with web search results
Multiple Outputs: Generate up to 4 variations in a single request

Video Generation

Veo Models: Veo 3.1 (with audio), Veo 3.1 Fast, and Veo 3.0
Native Audio: Automatic audio generation for Veo 3+ models
Image-to-Video: Transform still images into dynamic videos
Frame Control: Specify first and last frames for precise control
Flexible Duration: 4, 6, or 8 second videos
HD Quality: 720p or 1080p resolution options

Embeddings

Multiple Task Types: Optimized for retrieval, classification, clustering, and more
Batch Processing: Generate embeddings for multiple texts efficiently
Semantic Comparison: Built-in similarity calculation with multiple metrics
Dimension Reduction: Custom output dimensionality (256, 512, 768)
File-Based Workflows: Load/save embeddings for offline processing

Authentication Options

The package supports two authentication methods:

Direct API Key: Use your own Google AI Studio API key
Robomotion AI Credits: Pay-per-use billing through Robomotion's managed service (no API key required)

Getting Started

Connect: Establish a connection using your API key or Robomotion credits
Generate: Use any generation node (text, image, video) with your prompts
Process Results: Access generated content through output variables
Disconnect: Close the connection when done (optional, automatic cleanup on flow end)

Common Use Cases

Document Processing

Extract structured data from invoices, receipts, and forms
Summarize long documents and reports
Translate documents while preserving formatting
Answer questions about uploaded PDFs and images

Content Creation

Generate marketing copy and product descriptions
Create social media posts and captions
Write code documentation and technical guides
Produce image assets for presentations and marketing

Data Analysis

Classify and categorize text data
Perform sentiment analysis on customer feedback
Find similar items using semantic embeddings
Generate insights from business data

Automation

Build AI-powered chatbots and assistants
Automate customer support responses
Process and route incoming communications
Generate reports and summaries on schedule

Model Selection Guide

Text Generation

Gemini 3 Pro: Best quality, advanced reasoning, highest cost
Gemini 2.5 Pro: Balanced performance and cost
Gemini 2.5 Flash: Fast, cost-effective for most tasks
Gemini 2.5 Flash Lite: Ultra-fast, lowest cost for simple tasks
Gemini 2.0 Flash: Legacy model with good performance

Image Generation

Nano Banana Pro (3 Pro Preview): Best quality, supports up to 14 reference images
Nano Banana (2.5 Flash): Fast, good quality for most use cases