Artificial intelligence model training and testing tools startup Patronus AI Inc. today announced the availability of a new offering called “Generative Simulators” that are designed to help evaluate and improve autonomous AI agents.
The new simulators are a core element of Patronus AI’s reinforcement learning environments, which are simulated worlds that enable thorough testing of AI agents. They can adapt these simulations on the fly, continuously creating new tasks, scenarios and rules to ensure that AI agents are constantly learning new things and never go stale.
Within Patronus AI’s RL environments, AI agents can learn new skills and capabilities through trial-and-error within a virtual setting that mimics real-world workflows. Each environment incorporates domain-specific rules, best practices and verifiable rewards that provide incentive to AI agents to optimize their performance on a range of work-related tasks. They enable developers to expose agents to new kinds of reasoning challenges and interruptions, so they can evolve over time. They also serve to evaluate the skills of AI agents.
The startup says training AI agents and evolving them over time remains a key challenge for foundation model labs. AI agents are designed to perform tasks autonomously with minimal human supervision, and are therefore a whole different ball game compared to standard generative AI chatbots.
One of the main problems is that the static tests and training data used to create the large language models that power AI agents do not reflect the dynamic and interactive nature of real-world workflows. As a result, agents that perform well on static benchmarks can fall apart when they’re deployed in the real world and the requirements of a task evolve. Agents must also learn to use third-party tools successfully and stay on track over long periods of time.
Patronus AI co-founder and Chief Executive Anand Kannappan said traditional benchmarks are good for measuring isolated capabilities of AI models, but they don’t take into account the constant context switching, interruptions and multilayered decision-making that occurs when they’re doing work. “For agents to perform tasks at human-comparable levels, they need to learn the way humans do – through dynamic, feedback-driven experience that captures real-world nuance,” he said.
Evaluations are performed by Patronus AI’s Glider LLM, which was purpose-built as a fast, impartial and highly flexible “judge” for third-party AI models. If any improvements are required, they can be carried out by Percival, a second model developed by the company that’s designed to find and automatically fix AI malfunctions. Percival automates this process, analyzing agent’s workflows to identify any specific substeps within them that cause problems before suggesting a way to fix it.
The new Generative Simulators are meant to facilitate this kind of learning. They can generate new “assignments” for agents alongside the surrounding conditions, oversight process and so on, and then adapt these continuously based on the agent’s behavior.
So instead of a fixed training environment, they act more like a “living practice world” that continually creates newer and more relevant challenges and feedback. As a result, the company said, AI agents never stop learning and improving.
The simulators also support a new training technique Patronus AI has devised that’s called Open Recursive Self-Improvement or ORSI. Within its training environments, ORSI allows agents to improve their performance on new tasks through interactions and feedback, without the need for a full retraining cycle between attempts.
“When a coding agent can decompose a complex task, handle distractions mid-implementation, coordinate with teammates on priorities and verify its work, that’s when we’re seeing true value,” said AI Chief Technology Officer Rebecca Qian. “Our RL environments give foundation model labs the training infrastructure to develop agents that don’t just perform well on predefined tests, but work in the real world.”
Image: Patronus AI
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
