Gemini 2.5 Computer Use Model: Google Brings Agentic UI Control to Developers
Google has introduced the Gemini 2.5 Computer Use model, now available in preview via the Gemini API. Built on the visual reasoning strengths of Gemini 2.5 Pro, this specialized model enables agents to interact directly with user interfaces (UIs) — bridging the gap between structured APIs and graphical applications.
A Step Toward True Computer Interaction
Traditional AI systems interact with software through structured APIs, but many digital workflows — like filling out online forms or managing dashboards — still depend on graphical interaction. The Gemini 2.5 Computer Use model allows agents to perform these actions autonomously by clicking, typing, and navigating web or mobile interfaces, similar to how humans do.
This capability marks a critical advancement in general-purpose AI agents. It enables automated workflows such as form submissions, UI testing, and online data entry without requiring human input.
How It Works
The model’s functionality is exposed through the new computer_use tool within the Gemini API. Developers provide three key inputs:
- The user request (task description)
- A screenshot of the environment
- A history of previous actions
Using these, the model generates a function call to perform UI actions such as clicks or text entry. It may also request user confirmation for high-impact operations like purchases. After each step, the client captures a new screenshot and URL, feeding them back to the model in an iterative loop until the task is completed or a safety stop occurs.
While currently optimized for web browsers, early results show strong promise in mobile UI control, though desktop OS-level interactions remain under development.
Performance and Safety
According to internal benchmarks and third-party evaluations by Browserbase, the Gemini 2.5 Computer Use model outperforms leading competitors on multiple web and mobile control benchmarks, offering superior accuracy and lower latency.
Google emphasized safety as a core design principle. To mitigate misuse risks — such as malicious automation or prompt injection — the model integrates trained-in safety mechanisms and external safeguards:
- A per-step safety service that evaluates each proposed action before execution.
- System instructions allowing developers to require confirmation for sensitive actions.
These protections help ensure that agents operate within safe and ethical boundaries.
Real-World Applications
Early adopters are already seeing tangible results. Google’s internal teams use the model to automate UI testing, reducing software QA time. Project Mariner, Firebase Testing Agent, and Search’s AI Mode have also integrated it for resilient automation and error recovery.
External testers report similar gains: “Gemini 2.5 Computer Use is far ahead of the competition, often being 50% faster,” said Poke.com, a proactive AI assistant company.
“It outperformed other models at reliably parsing context in complex cases, increasing performance by up to 18%,” noted Autotab.
Getting Started
Developers can access the public preview now through the Gemini API in Google AI Studio and Vertex AI.
- Try it live: Experiment via Browserbase’s demo environment.
- Build locally: Use reference documentation to deploy with Playwright or in a cloud VM.
- Engage: Join Google’s developer forum to share feedback and guide future updates.
The Gemini 2.5 Computer Use model represents a significant step toward autonomous, interface-aware AI agents, opening new horizons in workflow automation, testing, and intelligent system design.
Join thousands of practitioners at ODSC AI West 2025, the leading applied data science conference. Gain hands-on training in generative AI, LLMs, RAG, AI Safety, and more through expert-led workshops and bootcamps. Explore cutting-edge tools in the AI Expo Hall, connect with industry leaders, and customize your experience with flexible 1- to 4-day passes. Don’t miss this chance to expand your AI skills and network — register now to secure your spot.
