Summary:
1. Google’s DeepMind AI lab subsidiary released a new version of its Gemini 2.5 Pro LLM called Gemini 2.5 Pro Computer Use, enabling virtual browsing and action-taking on websites.
2. The model is accessible through Browserbase and offers capabilities for developers to create autonomous agents for interface-driven tasks.
3. Performance benchmarks demonstrate Gemini 2.5 Computer Use’s leading results in interface control accuracy and lower latency compared to other AI models.
Article:
Google’s DeepMind AI lab subsidiary has introduced a customized version of its Gemini 2.5 Pro LLM, known as Gemini 2.5 Pro Computer Use. This advanced model allows users to navigate the web, retrieve information, complete forms, and take actions on websites, all through a single text prompt. The model, while not directly available to consumers from Google, is accessible through Browserbase, a company founded by former Twilio engineer Paul Klein.
Gemini 2.5 Computer Use is designed to empower developers in creating agents capable of autonomously performing interface-driven tasks like clicking, typing, scrolling, and filling out forms. Unlike traditional AI systems that rely solely on APIs or structured inputs, this model enables interactions with software visually and functionally, mimicking human behavior.
In hands-on tests, Gemini 2.5 Computer Use showcased successful navigation to websites and completion of tasks like searching for products. However, the model currently lacks direct file system access or native file creation capabilities. Despite this limitation, it offers significant potential for developers seeking to automate UI interactions efficiently.
Performance benchmarks have highlighted Gemini 2.5 Computer Use’s superior accuracy and lower latency compared to other AI systems like OpenAI’s agents. With leading results in interface control accuracy, the model has shown promising outcomes in various benchmark evaluations conducted by Browserbase and Google.
Powered by an interaction loop, agents utilizing the Computer Use model receive user task prompts, interface screenshots, and a history of past actions to produce recommended UI actions. The model incorporates safety measures to ensure secure interactions, including per-step safety services and built-in safeguards to prevent compromising security.
Gemini 2.5 Computer Use is already being adopted across diverse domains, with teams reporting improved efficiency and performance in tasks involving complex data parsing and interface interactions. The model’s technical capabilities, API pricing, and features distinguish it from its predecessor, Gemini 2.5 Pro, offering developers a robust tool for building autonomous agents for various applications.