Operator Works Differently (and Better) To Other Agents Who See Our Screen. Your Secret: Cu

We already have OpenAi’s agent. It is called Operator, and it is a system capable of seeing our screen and performing actions autonomously in the browser from our requests. It is something we had already seen with ‘Computer Use’ by Anthropic or Mariner de Deepmind, but here the company led by Sam Altman has its own special ingredient.

Computer-Using Agent (CUA). Operator uses a model called Computer-Useing Agent (CUA) that is based on GPT-4O. CUA interprets screenshots and interacts with websites through the typical browser controls, such as a cursor or a mouse.

How Cua works. As explained in the Openai documentation, this system processes those “raw pixels” of the captures that you make and use a mouse and a virtual keyboard to complete its actions. Once you have the screenshot, “reason” and follow a “thought” line in which the past actions take into account.

Promising performance. There are several benchmarks since they allow to evaluate the ability of these agricultural models. According to the tests carried out internally in OpenAI, Cugra achieves 38.1% performance in Osworld (use of a computer in general) against platforms such as Anthropic, which achieves 22%. Humans, yes, achieve 72.4% on average, which makes it clear that these systems still have a lot of improvement margin. In the use of the browser, the Benchmarks Webarena and Webvoyager also allow Operator to score very high: 58.1% and 87% respectively, compared to 36.2% and 56% of their competitors.

What about those catches that I collect operator. Operator continuously performs screenshots to “see” the browser interface with which he interacts. That browser does not run on our PC, but in a remote browser on OpenAI servers. User data, including these catches, are used according to OpenAI’s privacy policy. This is: they can be used to detect fraudulent activities and to improve the service. That implies that our data can be used to train and improve the model, although we can deactivate that option in operator settings. The user, yes, has the capacity for how long this data is stored in Operator. By default these data are saved until the user decides to delete them.

An agent who asks for help (and confirmation) when he needs them. As we have seen in other agents such as ‘Computer Use’ of Anthropic, Operator is an agent who does not act crazy. If you meet an obstacle – like a captcha code or the request to introduce user and password on a website – you will ask that the user take control, and will also ask for the final user confirmation if for example we have to validate a reservation or purchase of a product that has sought Operator. The operator user can also take control at all times.

This is how it works. Source: OpenAi

Do not release the steering wheel. This reminds us of assisted driving systems such as Tesla FSD. It is true that it is able to take us from one place to another once we introduce the destination address, but it is important to continue paying attention and have our hands in the steering wheel in case they occur unforeseen. With Operator and the rest of this type agents something similar happens.

There are things that cannot be done. At the moment Operator cannot complete specialized tasks such as managing complex calendar systems or interacting with very personalized or non -standard websites. You will also refuse to do some tasks with high risk of provoking damages. For example, send emails, make electronic transactions or delete calendar events. Its benefits and capabilities will increase, without a doubt, but they will gradually do so and always guaranteeing that the possibility of error is the least possible.

Image | OpenAI

In WorldOfSoftware | The generative AI seems stagnant. Big tech believe they have an ace in the sleeve: “agents” who do things for us

Operator works differently (and better) to other agents who see our screen. Your secret: Cu

Leave a Reply Cancel reply

Stay Connected

Latest News

13 Free Company Policy & Procedures Templates: Word &

Dollar General shopper demands answers as store prices are ‘always wrong’

Shenmue voted the most influential video game of all time in Bafta poll

The HackerNoon Newsletter: How to Master Distributed Cache in Nest.JS (4/3/2025) | HackerNoon

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News