AI Agents Are Taking On More And More Work: Why Verification Is Becoming A Central Problem

Although modern AI models are extremely powerful, they still pose numerous challenges in practice. At this year’s “Fortune Brainstorm Tech” conference, executives from various companies shared their experiences with AI agents. Many reported similar problems – especially with regard to the traceability of the results.

AI agents must work more transparently

Tech companies have sometimes aggressively pushed forward the use of AI agents. Nvidia boss Jensen Huang is even said to have said that his employees are “crazy” if they don’t use AI for as many tasks as possible. Meta also takes a similar approach. However, this is not without consequences: Researcher Summer Yue wanted to have her mailbox managed by an Openclaw agent – instead, the agent deleted all emails. Given such issues, the traceability and reliability of AI agents were high on the agenda at this year’s Fortune Technology and Innovation Conference.

“A key question that concerns us is how to design a system that works correctly as often as possible,” said Edwin Olson, founder and CEO of May Mobility. Since errors are inevitable, transparency plays a crucial role. You have to understand why an error occurs in order to avoid it in the future. Thomson Reuters, which offers AI-powered legal and tax compliance services, was also an early focus on accountability. According to Chief Data Officer Caitlin Halferty, transparency in her company is one of four pillars of “fiduciary” product quality – alongside data protection, subject matter expertise and reliable content.

Editorial recommendations

Reviewing the results is time-consuming

Several participants also emphasized the importance of self-regulating systems. At May Mobility, this means equipping autonomous vehicles with systems that can simulate and evaluate several scenarios at the same time. Elena Kvochko, founder and CEO of Trustguard AI, describes a similar method in which AI systems monitor each other. This is comparable to working in an editorial office: one agent is the author of a text and the other is the editor, whose only job is to find errors or inaccuracies. It is crucial that the review takes place in separate systems: “You don’t want AI to evaluate its own work,” says Kvochko.

Such structures are becoming increasingly important as AI takes on more and more tasks and exceeds control capacities. “You end up in a situation where so much work has been done and there is so much to review that you can’t really be held accountable,” said Gregor Stewart, chief AI officer at Sentinel One. This discrepancy is particularly clear in programming: Waydev CEO Alex Circei told Techcrunch that although AI produces more code, it needs to be revised more often. The initial acceptance rate was 80 to 90 percent, but subsequent corrections reduced it to ten to 30 percent.

AI agents still often cause additional work

AI agents often create additional work instead of saving time. However, the assessments vary depending on the position: According to a survey by the consulting firm Section, 40 percent of employees report no time savings through AI. At least 19 percent of managers said they saved more than twelve hours a week. In order to achieve real added value, the problem of time-consuming verification must be solved. Instead of reviewing tens of thousands of lines of code manually, teams are looking for ways to automate this process. According to Stewart, methods originally developed for safety-critical industries could be used.