Skip to main content

Image Automation

Automate desktop applications using visual recognition - find and click images, read text from screens, and interact with any application regardless of its technology.

Overview

The Image Automation package enables visual-based desktop automation using image recognition and OCR. Use it when you need to automate applications that don't have accessible UI elements, legacy systems, Citrix/virtual environments, or any scenario where traditional element-based automation isn't possible.

Key Features

  • Image Recognition - Find and interact with UI elements by their visual appearance
  • Text Recognition (OCR) - Read and locate text anywhere on screen
  • Visual Clicking - Click on images or text patterns
  • Screenshot Capture - Take full or partial screenshots
  • Wait Conditions - Wait for images or text to appear

Available Nodes

  • Find Image - Locate an image pattern on screen, returns coordinates
  • Click Image - Find and click on an image pattern
  • Wait Image - Wait until a specific image appears on screen
  • Find Text - Locate text on screen using OCR
  • Click Text - Find and click on specific text
  • Text Exists - Check if text is visible on screen
  • Get Text - Extract text from a screen region using OCR
  • Take Screenshot - Capture screen or region as image
  • Select Copy - Select and copy text from screen
  • Click Type - Click on a location and type text
  • Actions - Perform various mouse/keyboard actions at image location

When to Use This Package

  • Citrix/Remote Desktop: Automate applications in virtual environments
  • Legacy Applications: Work with old software without accessible UI
  • Image-Based Workflows: When UI elements can't be identified by selectors
  • Cross-Platform Apps: Automate any visible application
  • Visual Verification: Confirm expected images appear on screen

Typical Workflow

  1. Use Find Image or Find Text to locate target on screen
  2. Click Image or Click Text to interact with the element
  3. Wait Image to wait for application response
  4. Get Text to extract data from screen regions
  5. Take Screenshot for logging or verification

Tips

  • Capture distinctive images that won't match multiple locations
  • Use confidence/threshold settings for fuzzy matching
  • Combine with Wait Image for reliable synchronization
  • Consider screen resolution differences across machines