Google on Tuesday announced a brand-new AI model called Gemini 2.5 Computer Use, releasing it in preview to developers. If you’ve been following the AI industry, you might be familiar with the term “computer use” we’ve been seeing from certain firms, including Anthropic, OpenAI, and Google. It designates an AI’s ability to interact with parts of the computer on behalf of the user. OpenAI’s ChatGPT Agent is a good example of that, as it is a model that works on complex prompts inside its own virtual computer, where it can browse the web and perform some actions for the user. Google’s Gemini 2.5 Computer Use isn’t quite an answer to ChatGPT Agent, but it does have one thing in common. Google’s Computer Use can also see and understand web browser user interfaces and perform actions like a human.
Gemini 2.5 Computer Use isn’t available to Gemini users who might interact with the chatbot regularly. But it could become an essential technology for future agentic abilities Google is developing for the Gemini chatbot and even Google Search, where AI Mode already supports some agentic features. Computer Use might also be included in future versions of Gemini in Chrome, the AI assistant available to Chrome users. That’s just speculation at the time of this writing. For now, Computer Use is ready for testing, with Google’s benchmarks indicating the new AI model can be faster and more accurate than alternatives.
Gemini 2.5 Computer Use is available for testing via the Gemini API in Google AI Studio and Vertex AI. It’s also available in Browserbase, where users can tell the AI to play a game of 2048 in the browser and much more, as seen above.
What can Gemini 2.5 Computer Use do?
Google describes the new AI tool as a “new specialized model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that power agents capable of interacting with user interfaces (UIs).” The company also explains that Computer Use is required to allow AI models to interact with graphical UIs in those cases where it can’t connect to software via API. Actions like filling and submitting forms involve navigating web pages as a person would. The AI has to be able to scroll web pages, click on buttons, interact with menus, and type inside forms.
The new AI model will look at the user’s prompt, take a screenshot of the environment, and review a history of recent actions. It will then perform a task and restart the loop. The AI will get a new screenshot and the web page URL where needed to see the outcome of the previous action and continue with the task. Gemini 2.5 Computer Use will continue the procedure until it finishes the task.
The video demo above shows the AI interacting with visual elements, following this prompt: “My art club brainstormed tasks ahead of our fair. The board is chaotic, and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.” The AI clicks on the various sticky notes and drags them where they should be.
Google also shared benchmark results, saying that Gemini 2.5 Computer Use “demonstrates strong performance on multiple web and mobile control benchmarks.” The scores in the image above show the Google model outperforming rivals, including OpenAI’s Agent model. That said, Google’s model only works in the browser. It’s not optimized for operating system control.