Google LLC’s DeepMind research unit today announced a major update to a couple of its artificial intelligence models, which are designed to make robots more intelligent. With the update, intelligent robots can now perform more complex, multistep tasks and even search the web for information to aid in their endeavors.
The newly released models include Gemini Robotics 1.5, which drives the robots, and Gemini Robotics-ER 1.5, which is an embodied reasoning model that helps them to think.
DeepMind originally released these models in March, but at the time they were only capable of performing singular tasks, such as unzipping a bag or folding a piece of paper. But now they can do a lot more. For instance, they can separate clothes in a laundry basket by light and dark colors, and they can pack a suitcase for someone, choosing clothes that are suitable for the predicted weather conditions in London or New York, DeepMind said.
For the latter task they’ll need to search the web for the latest weather forecast. They can also search the web for information they need to perform other tasks, such as sorting out the recyclables in a trash basket, based on the guidelines of whatever location they’re in.
In a blog post, DeepMind Head of Robotics Carolina Parada said the models will help developers to build “more capable and versatile robots” that actively understand their environments.
In a press briefing, Parada added that the two models work in tandem with each other, giving robots the ability to think multiple steps ahead before they start taking actions. “The models up to now were able to do really well at doing one instruction at a time in a way that is very general,” she said. “With this update, we’re now moving from one instruction to actually genuine understanding and problem-solving for physical tasks.”
Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 are known as “vision-language-action” or VLA models, but they’re designed to do different things. The former transforms visual information and instructions into motor commands that enable robots to perform tasks. It thinks before it takes actions, and shows this thinking process, which helps robots to assess and complete complex tasks in the most efficient way.
As for Gemini Robotics-ER 1.5, this is designed to reason about the physical environment it’s operating in. It has the ability to use digital tools such as a web browser, and then create detailed, multi-step plans on how to fulfill a specific task or mission. Once it has a plan ready, it passes it on to Gemini Robotics 1.5 to carry it out.
Parada said the models can also “learn” from each other, even if they’re configured to run on different robots. In tests, DeepMind found that tasks only assigned to the ALOHA2 robot, which has two mechanical arms, could later be performed just as well by the bi-arm Franka robot and Apptronik’s humanoid robot Apollo.
“This enables two things for us,” Parada said. “One is to control very different robots, including a humanoid, with a single model. And secondly, skills that are learned on one robot can now be transferred to another robot.”
Google said Gemini Robotics-ER 1.5 is being made available to any developer that wants to experiment with it through the Gemini application programming interface in Google AI Studio, which is a platform for building and fine-tuning AI models and integrating them with applications. Developers can read this resource to get started with building robotic AI applications.
Gemini Robotics 1.5 is more exclusive, and is only being made available to “select partners” at this time, Parada said.
Image: Google
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.