Researchers from MIT and other collaborating institutions have published a report analyzing the 30 most common AI agent-based systems. The conclusions of the study (“The 2025 AI Index: Documenting the Sociotechnical Characteristics of Deployed Agentic AI Systems”) are generally (there are different levels) worrying and talk about an AI “out of control” and one “security nightmare”.
Agentic technology is being fully incorporated into the mainstream of artificial intelligence and the announcement last week of the hiring by OpenAI of Peter Steinberg, the creator of the open source software framework OpenClaw, reinforces the commitment to this functionality considered as a ‘star’ within the revolution that artificial intelligence represents.
The OpenClaw software attracted attention not only for enabling wild capabilities (agents that can, for example, send and receive email on the user’s behalf), but also for its dramatic security flaws, including the ability to completely hijack personal computers.
Given the fascination that agents generate and how little is still known about their advantages and disadvantages, it is important that prestigious researchers have analyzed their operation. And there is no good news, at least from the security sectionas they assure that it is technology marked by a lack of disclosure, a lack of transparency and a surprising absence of basic protocols on how agents should operate.
Lack of transparency in AI agents
The report’s biggest revelation is the difficulty in identifying all the problems that could arise with AI agents. This is mainly due to the lack of transparency on the part of the developers. “We identify persistent limitations in reporting on ecosystem and safety-related characteristics of agentic systems.”wrote Leon Staufer, the lead author of a study involving researchers from MIT, the University of Cambridge, the University of Washington, Harvard University, Stanford University, the University of Pennsylvania and the Hebrew University of Jerusalem.
In eight different disclosure categories, the authors noted that most agent systems do not provide information some for most categories. The omissions range from failure to disclose potential risks to failure to disclose testing performed by third parties, if any.
The report is full of criticism about how little can be tracked, tracked, monitored and controlled in current agentic AI technology. For example, “for many enterprise actors, it is not clear from publicly available information whether monitoring of individual execution traces exists”meaning there is no clear ability to track exactly what an agent AI program is doing.
“Twelve of the thirty agents do not offer usage monitoring or only notify when users reach the speed limit”the authors noted. This means that You can’t even control how much an agent’s AI consumes of a given IT resource, a key concern for companies that must budget for these types of resources.
Most of these agents also do not indicate to the real world that they are artificialso there is no way to know if you are dealing with a human or a bot. “Most agents do not disclose their AI nature to end users or third parties by default”they point out. In this case, disclosure would include measures such as adding a watermark to a generated image file to make it clear when it was created by AI, or responding to a website’s “robots.txt” file to identify the agent as an automation and not a human visitor.
Some of these software tools do not offer any way to stop a given agent from running. Alibaba’s MobileAgent, HubSpot’s Breeze, IBM’s watsonx, and automations created by German software maker, n8n, “lack documented detention options despite autonomous enforcement”.
“For enterprise platforms, sometimes there is only the option to stop all agents or roll back the deployment”they point out. A big problem because finding out that you can’t stop something that is doing the wrong thing must be one of the worst possible scenarios for a large organization where the harmful results outweigh the benefits of automation.
The authors anticipate that these transparency and control problems persist among AI agents and even become more pronounced. “The governance challenges documented here (ecosystem fragmentation, tensions in network behavior, lack of agent-specific assessments) will become important as agent capabilities increase”they assure.
Staufer and his team also say they tried to get feedback from companies whose software was reviewed for four weeks. Approximately a quarter of those contacted responded, “but only 3 out of 30 with substantial comments”. Those comments were incorporated into the report, the authors wrote. They also have a form provided to companies for ongoing corrections.
An expanding panorama
Agent artificial intelligence is a branch of machine learning that has emerged in recent years to improve the capabilities of large language models and chatbots. Depending on the use cases, product offering, and vendor, you may see varying definitions of an AI agent. However, in general, we can define them as a software tool that can perform tasks autonomously.
And there is more. Instead of simply being assigned a single task dictated by a text instruction, agents are AI programs that connect to external resourcessuch as databases, and which have been given a measure of “autonomy” to pursue goals beyond the scope of a text-based dialogue.

This autonomy can include performing several steps in a corporate workflow, such as receiving a purchase order by email, entering it into a database, and checking availability in an inventory system. Agents have also been used to automate various stages of customer service interaction, replacing some of the basic phone, email or SMS inquiries that would traditionally have been handled by a customer service representative.
Thus, the study authors selected agentic AI into three categories: chatbots with additional capabilities, such as Anthropic’s Claude Code tool; web browser extensions or AI-dedicated browsers, such as OpenAI’s Atlas browser; and business software solutions such as Microsoft’s Office 365 Copilot. This is just a sample: other studies, they noted, have covered hundreds of agentic technology solutions.
However, most agents “they are based on a small set of closed source frontier models”said Staufer and his team. Most of these agents are based on GPT from OpenAI, Claude from Anthropic, and Gemini from Google.
The good and bad of AI agents
The study is not based on directly testing agentic tools, but on the “annotation” of documentation provided by developers and vendors. This includes “only public information from documentation, websites, demos, published articles and governance documents”as they explain. However, user accounts were established on some of the systems to verify the actual operation of the software.
The authors offered three examples that go deeper into the topic and confirm that There are different levels of functioning between AI agents. A positive example, they wrote, is OpenAI’s ChatGPT Agent, which can interact with websites when a user requests in the message to perform a web task. The Agent is positively distinguished as the only one of the agent systems analyzed that provides a means to track behavior by cryptographically signing browser requests.
In contrast, Perplexity’s Comet web browser looks like a security disaster. Staufer and his team found that the program does not have agent-specific safety assessments, third-party testing, or comparative performance disclosures. Additionally, Perplexity has not documented the methodology or results of Comet’s security assessment. Furthermore, no approaches to sandboxing nor containment beyond rapid injection mitigations.
The third example is the Breeze agent suite from enterprise software provider HubSpot. These are automations that can interact with systems of record, such as customer relationship management systems. They found that Breeze tools offer a mix of advantages and disadvantages. For one, they are certified for numerous corporate compliance measures, such as SOC2, GDPR, and HIPAA.
On the other hand, HubSpot does not offer information on security testing. It states that Breeze agents were evaluated by third-party security company PacketLabs, “but does not provide methodology, results or details of the testing entity”. The practice of demonstrating compliance approval but not disclosing security assessments real is «typical of business platforms»Staufer and his team said.

Assume responsibilities
What the report does not examine are incidents in practice, cases in which agentic technology produced unexpected or unwanted behavior that had undesirable results. This means that We still do not know the full impact of the deficiencies identified by the authors. But one thing is clear: agentic AI is the product of development teams making specific decisions. These agents are tools created and distributed by humans.
Therefore, the responsibility to document software, audit programs for security issues, and provide control measures falls directly on the suppliersat OpenAI, Anthropic, Google, Microsoft, Perplexity and other organizations. It is up to them to take the necessary measures to remedy the serious deficiencies identified or otherwise face future regulations and also the loss of trust for failing to fulfill the, perhaps overly optimistic, promises of the AI agents.
* AI generated cover image
