OpenAI today launched a new artificial intelligence agent that can perform complex tasks in the user’s browser.
ChatGPT agent, as the feature is called, is powered by a new reasoning-optimized AI model. OpenAI says that the algorithm outperforms its earlier AI systems across a range of benchmarks.
The agent is designed to automate tasks that require the user to perform actions in multiple cloud applications. A developer, for example, could have it download a code file from GitHub and save it in a Google Drive folder. ChatGPT could also be instructed to run the file through a vulnerability scanner before saving it.
ChatGPT agent uses two different browsers to interact with online services. The first browser, which is mainly optimized to process text, powers “simpler reasoning-based web queries.” The second browser allows ChatGPT agent to interact with websites via their graphical interfaces similarly to how a user would.
ChatGPT asks for permission before performing sensitive actions such as making a purchase. Furthermore, OpenAI requires users to actively supervise the tool while it carries out such tasks. The built-in controls make it possible to stop a task, complete it manually or provide ChatGPT agent with updated instructions.
Browsers aren’t the only type of application with which the tool can interact. Users may give it access to a terminal, a program that makes it possible to interact with a computer’s operating system using scripts. ChatGPT agents can harness the terminal for tasks such as editing files.
“The model can choose to open a page using the text browser or visual browser, download a file from the web, manipulate it by running a command in the terminal, and then view the output back in the visual browser,” OpenAI staffers wrote in a blog post.
ChatGPT agent is powered by a new AI model that outperforms o4-mini and o3 at certain reasoning tasks. In one internal test, OpenAI had the three algorithms tackle the FrontierMath mathematical benchmark, which is considered to be the most difficult in its category. ChatGPT agent’s model scored 27.4%, while o4-mini and o4 managed 19.3% and 10.3%, respectively.
In another evaluation, OpenAI tested ChatGPT agent’s spreadsheet know-how using a benchmark called SpreadsheetBench. It achieved a 25% better score than the version of Microsoft Copilot included in Excel.
OpenAI developed a new set of guardrails for ChatGPT agent to prevent hackers from misusing its capabilities. The safeguards place particular emphasis on blocking malicious prompts hidden in webpages. “We’ve trained and tested the agent on identifying and resisting prompt injections, in addition to using monitoring to rapidly detect and respond to prompt injection attacks,” the OpenAI staffers detailed.
The agent is available today in the Pro, Plus and Team tiers of ChatGPT.
Image: OpenAI
Support our open free content by sharing and engaging with our content and community.
Join theCUBE Alumni Trust Network
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
11.4k+
CUBE Alumni Network
C-level and Technical
Domain Experts
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.
News Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of News, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — News Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.