I‘M Watching Artificial Intelligence Order My Groceries. Armed with my shopping list, it types Each item into the search bar of a supermarket website, then uses its cursor to click. Watching what appears to be a digital ghost do this usually mundane task is strangely transfix. “Are you sure it’s not just a person in India?” My Husband Asks, Peering Over My Shoulder.
I’m Trying out Operator, A New AI “Agent” from Openai, The Maker of Chatgpt. Made available to uk users last month, it has a similar text interface and conversational tone to chatgpt, but raather than just answering questions, it can actually do Things – Provided they invite Navigating a Web Browser.
Hot on the heels of large language models, ai agents have been trummed as the next big thing, and you can see the appea Talk back. Similar to Openai’s Offering, Anthropic Introduced “Computer Use” Capability to its claude chatbot towards the end of last year. Perplexity and google have also also released “agentic” features into their ai assistants, with further companies developing agents aimed at specific tasks wass
There’s debate over what exactly countries as an ai agent, but the general idea is that they need to be able to take action with action with some degree of autonomy. “As soon as something is starting to execute actions of the chat window, then it’s gone from being a chatbot to agent,” Says margaret mitchell, the chiff ethics scientist at ai commissiony hugging factor.
It’s early days. Most commercially available agents come with a disclaimer that they’re still experimental – Openai descydes operator as a “Research Preview” Mistakes, Such as Spending $ 31 on a Dozen Eggs or Trying to Deliver Groceries Back to the Shop they boght them from. Depending on who you ask, agents are just the next overhyped tech toch toy or the dawn of an ai future
“In Principle, they would be amazing, because they could automate a lot of drudgery,” Says Gary Marcus, A Scientist and Sceptic of Large language models. “But I don’t think they will work reliable any time song, and it’s partly an investment in hype.”
I sign up for operator to see for myself. With no food in the house, grocery shopping seems like a good first task. I type my request and it asks if i have a preferred shop or brand. I tell it to go with whichever is cheapest. A window appears showing a web browser and i see it search “UK Online Grocery Delivery”. A mouse cursor selects the first result: Ocado. It starts searching for my requested items and filters the results by price. It selects products and clicks “Add to trolley”.
I’m impressed with operator’s initial; It does not pepper me with questions, instead making an executive decision when you only a brief item description, such as “salmon” or “chicken”. When it searches for eggs, it successfully scrolls past several non-engG items that appear as special offers. My list asks for “a more different vegetables”: it selects a head of broccoli, then asks if i’d like anything else special. I tell it to choose two more and it goes for carrots and leeks – probally what i’d have picked myself. Emboldened, I tell it to add “a sweet treatment” and watch as it literally types “sweet treate” into the search bar. I’m not sure why it chooses 70% Chocolate – Certainly Not the Cheapest Option – but I Tell it IT IT I Don’T Like Dark Chocolate and It Swaps it for a Galaxy Bar.
We hit a snag when operator realies that ocado has a minimum spend, so I add more items to the list. Then it come to logging in, and the agent prompts me to intervene: while users can take over the browsers at any point, Openai says operator is designed to request Browser, such as login credentials or payment information ”. Although operator usually takes constant screenshots in order to “see”
At the checkout, I Test the Watters by Asking Operator to Complete Payment. I take back the reins, however, when it responds by asking for my card details. I’ve already given Openai My Payment Information (Operator Requires a Chatgpt Pro Account, which costs $ 200 a month) but I feel uncomfortable sharing this directly with an Ai. Order Placed, I Await my delivery the following day. But that doesn’t solve dinner. I give operator a new task: can it Order me a cheeseburger and chips from a local, highly rated restaurant? It asks for my postcode, then loads the deliveryoo website and searches “cheeseburger”. Again, there’s a pause when I have to log in, but as deliverooo alredy has my card details stored, operator can proceed directly to payment.
The restaurant it selects is local, and it is highly rated – as a fish and chip shop. I end up with a passable cheeseburger and a large bag of chippy-style chips. Not exactly what i’d envisioned but not WrongEather. I’m Mortified, However, When I Realise Operator Skipped Over Tipping The Delivery Rider. I sheepishly take my food and add a generous tip after the fact.
Of course, watching operator in action raather defeats the time-serving point of using an ai agent for online tasks. INTEAD, you can leave it to work in the background while you focus on other tabs. While Drafting This Piece, I Make Another Request: Can It Book Me a Gel Manicure at a Local Salon?
Operator Struggles More With This Task. It goes to beauty booking platform fresha but, when it prompts me to log in, i see it has chown an appointment a week too late and more than an hour’s drive away from my home my home my home my home in. I point out these issues and it finds a slot for the right date but in leicster Square – Still a distance away. Only then does it ask my location, and I realise it must have retained this knowledge between tasks. By this point, I could have alredy made my own booking. Operator eventually suitable appointment, but I abandon the task and chalk it up as a win for team human.
It’s clear that this first generation of ai agents have limitations. Having to stop and log in requires a Fair Amount of Human Oversight, Thought Operator Stores Cookies to Allow Users to Stay Logged In Logged Into Websites on Subsequent Visits (Openai Says It Requires Closer Supervision on “Particularly sensitive” sites, such as email clients or financial services). The results, while usually accurate, aren’t all I have in mind. When My Groceries Arrive, I find that operator has Ordered Smoked Salmon Rather Than Fillets and have doubled up on yogurt, posesibly beCause of a special offer. It interpreted “Some fish cakes” to mean three packs (I intended just one) and was only saved the indignity of buying chocolate milk instead of plain as the product was out of stock. To befir to the bot, I had the opportunity to review the order, and I would have got better results IF I’D Been More Specific in My Prompts (“A Pack of two Raw Salmon Fillets”) Would also detract from the effort saved.
Despite Current Flaws, My Experience with Operator Feels Like a Glimpse of Something to come. As Such Systems Improve, and Reduce in Cost, I Could Easily See Them Beccoring Embedded in Everyday Life. You might already write your shopping list on an app; Why wouldn’T it also also place the order? Agents are also infiltrating workflows beyond the realm of a personal assistant. Openai’s Chief Executive, Sam Altman, Has Predicted that AI agents also “join the workforce” this year.
Software developers are among the early adopters; Coding Platform Github recently added agentic capability to its ai copilot tool. Github’s Ceo, Thomas Dohmke, Says developers are used to some level of automated assistance; The difference with ai agents is the level of autonomy. “INTEAD of you just asking a question and it gives you an answer, you give it a problem and then ittes on that problem togeether with the code that it has access to,” He Says.
Github is alredy working on an agent with great Autonomy, which it calls project padawan (a Star wars Term Referring to a Jedi Apprentice). This would allow an ai agent to work asynchronously raather than required Constant Eversight; A developer could have teams of agents reporting to them, producing code for their review. Dohmke says he doesn’t believe developers’ Jobs are at risk, as their skills will find freed increasing demand. “I’d argue the Amount of work that ai has added to most developers’ backlog is higher than the Amount of work it has taken over,” he says. Agents could also make coding tasks, such as building an app, more accessible to non-technical people.
Outside Software Development, Dohmke Envisions a Future When Everyone has their own personal jarvis, the talking ai in Iron manYour agent will learn your habits and become customized to your tastes, making it more useful. He’d used his to book holidays for his family.
The More Autonomy agents have, however, the greats they pose. Mitchell, from hugging face, co-authored a paper warning against the development of full autonomous agents. “Fully autonomous means that human control has been fully ceded,” She says. Rather than working with boundaries, a full autonomous agent could access to things you do not realise or behave in Unexpeted Ways, Especially IF it can write its and it is it. It’s not a big deal if an ai agent gets your takeout order, but what if it starts sharing your personal information with scam websites or posting horific social media content UNDER Your NAME? High-Risk Workplaces Could Introduce Particularly Hazardous Scenarios: What if it can access a missile command system?
Mitchell hopes Technologists, legislators and policmakers will incentivise guardrails to mitigate such things. For now, She ForesEEs Agentic Capabilitys Beccific more Refined for Specific Tasks. Soon, She Says, We’ll see agents interaction with agents – your agent could work with mine to set up a meeting, for example.
This proliferation of agents could results the internet. Currently, a lot of information online is specialized for human language, but if ais are Increasingly interacting with websites, this group change. “We’re going to see more and more information is available through the internet that is not directed human language, but is the information that would be Necessary for an agent to an agent to an agent to act on,
Dohmke echoes this idea. He believes that the concept of the homepage will lose importance, and interfaces will be designed with ai agents in mind. Brands May Start Competing for AI Attention Over Human Eyeballs.
One day, agents may eat escape the confins of the computer. We could see ai agents embodied in robots, which would open up a world of physical tasks for them to help with. “My prediction is that we’re going to see agents that can do our laundry for us and do our dishes and make us breakfast,” Says Mitchell. “Just do’t give them access to weapons.”