Google LLC’s artificial intelligence research lab DeepMind has introduced a new, video game-playing agent called SIMA 2 that can navigate through 3D virtual worlds it has never encountered before and solve all kinds of problems.
It’s a key step toward the creation of general-purpose agents that will ultimately power real-world robots, the research outfit said. Announced today, SIMA 2 builds on the release of DeepMind’s original video game-playing agent SIMA, which stands for “scalable instructable multiworld agent.” SIMA debuted around 18 months ago and displayed an impressive level of autonomy, but it was far from complete, failing to perform many kinds of tasks.
However, DeepMind’s researchers said SIMA 2 is built on top of Gemini, which is Google’s most powerful large language model, and that foundation gives it a massive performance boost. In a blog post, the SIMA Team said SIMA 2 can complete a much wider variety of more complex tasks in virtual worlds, and in many cases it can figure out how to solve challenges without ever having come across them before. It can also chat with users, and it improves its knowledge over time when it tackles more difficult tasks multiple times, learning by trial and error.
“This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general,” the SIMA team said.
The original SIMA learned to perform tasks inside of virtual worlds by watching the screen and using a virtual keyboard and mouse to control video game characters. But SIMA 2 goes further, because Gemini gives it the ability to think for itself, DeepMind said.
SIMA 2 is our most capable AI agent for virtual 3D worlds. 👾🌐
Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images. Here’s how 🧵 pic.twitter.com/DuVWGJXW7W
— Google DeepMind (@GoogleDeepMind) November 13, 2025
According to the researchers, Gemini enables SIMA 2 to interpret high-level goals, talk viewers through the steps it intends to take, and collaborate with other agents or humans in games with reasoning skills far beyond the original SIMA. They claim it shows stronger generalization across virtual environments and the ability to complete longer and more complicated tasks, including logic prompts, sketches drawn on the screen and emojis.
“SIMA 2’s performance is significantly closer to that of a human player on a wide range of tasks,” the SIMA Team wrote, highlighting that it achieved a task completion rate of 65%, way ahead of SIMA 1’s 31% and just shy of the average human rate of 71%.
The model was also able to interpret instructions and act inside virtual worlds that had been freshly generated by another DeepMind model known as Genie 3, which is designed to create interactive environments from images and natural language prompts. When exposed to a new environment, SIMA 2 would immediately orient itself, try to understand its surroundings and its goals, and then immediately take meaningful actions.
It does this by applying skills learned in earlier worlds to the new surroundings it finds itself in, the researchers explained. “It can transfer learned concepts like ‘mining’ from one game, and apply it to ‘harvesting’ in another game,” they said. “[It’s] like connecting the dots between similar tasks.”
SIMA 2 can also learn from human demonstrations before switching to self-directed play, where it employs trial and error and feedback from Gemini to create “experience data.” That’s then fed back into itself in a kind of training loop, so the model can attempt new tasks, learn what it did wrong and right, and then apply what it has learned when it tries a second time. In other words, it won’t make the same mistake twice.
DeepMind Senior Staff Research Engineer Frederic Besse told media during a press briefing that the endgame for SIMA 2 is to develop a new generation of AI agents that can be deployed inside robots, so they can operate autonomously in real-world environments. The skills it learns in virtual environments, such as navigation, using tools and collaborating with humans, can easily be applied to a setting such as a factory or a warehouse.
“If we think of what a system needs to do to perform tasks in the real world, like a robot, I think there are two components of it,” Besse said. “First, there is a high-level understanding of the real world and what needs to be done, as well as some reasoning. Then there are lower-level actions, such as controlling things like physical joints and wheels.”
Holger Mueller of Constellation Research said DeepMind’s gains with SIMA 2 are extremely impressive, doubling its performance and getting close to the level of humans within just 18 months. “Video games are the perfect proving ground for AI agents and reasoning,” he said. “The question now is when will SIMA be able to surpass humans with its problem solving capabilities. It may come soon, as we live in exciting times.”
DeepMind’s researchers said SIMA 2 is a massive step forward for AI agents, but they admitted there are still weaknesses in the system that remain to be addressed. For instance, the model still struggles with very long, multistep tasks and working with a limited memory window. It also struggles with some visual interpretation scenarios, they said.
Images: Google DeepMind
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
