Robotics and artificial intelligence (AI) go hand in hand. It would be useless to develop humanoid robots capable of lifting tons, with state -of -the -art sensors, if we did not have an intelligent system that would allow them to interpret the environment and act accordingly. Without ia, a modern robot would be little more than a lot of sophisticated but useless hardware. They are the Advanced algorithms Those who transform that gross power into machines capable of learning, optimize their performance and respond autonomously to the challenges that are presented to them.
From Asimo, the iconic Honda robot of the 2000s, to Sophia, Optimus de Tesla or Figure, the AI has made its way in humanoid robotics. However, we are still far from seeing machines that really match the versatility of the human body. As advanced as they are, they still have trouble moving in un controlled environments and manipulating everyday objects can be a real challenge.
Gemini Robotics: Google’s bet to take AI to the physical world
Meanwhile, in the digital world, AI advances at a completely different rate. He is already able to have conversations very close to those of a person, overcome exams with surprising scores and solve complex problems with a speed that until a few years ago seemed science fiction. A contrast that makes it clear that, although artificial intelligence progresses by leaps and bounds, there is still much Way to go in its integration with robotics.
These challenges are leading to a new generation of AI models specifically for this discipline. Google, as expected, does not want to be left behind and already works on solutions that promise to take humanoid robots one step further. His bet goes through Gemini 2.0, which now has two versions designed to improve the interaction and control of these machines.
On the one hand, Gemini Robotics It focuses on vision, language and action (VLA), which allows you to take direct control of robots and improve your response capacity in dynamic environments. On the other, Gemini Robotics-er It is designed for robotics experts, giving them the necessary tools to develop and execute their own programs with advanced reasoning skills.
Gemini Robotics-Er stands out in spatial reasoning with detection and signaling of 3D objects
Google has identified three essential qualities that, as they explain, must have robots to be really useful for people.
- Generality. A good robot should not only execute predefined tasks, but also adapt to unpublished situations and solve problems on the march. It must be able to function in new environments, handle unknown objects and interpret varied instructions without depending on prior training. According to internal tests, its performance in unforeseen tasks far duplicates that of other vision-language models-last generation.
- Interactivity In a world in constant change, robots must be able to communicate naturally and respond to real -time instructions. Gemini Robotics includes commands in everyday language and in multiple languages, adapting their behavior according to conversation or environment. In addition, it continually monitors what happens around it and adjusts its actions based on new orders or changes in the stage.
- Skill. Many tasks that humans perform effortlessly require extremely precise motor skills, something that most robots have not yet managed to dominate. Gemini Robotics, however, is capable of performing complex tasks of several steps that require a thorough manipulation, such as folding Origami or packing a snack in a Ziploc bag, demonstrating a higher level of skill.
Gemini Robotics not only stands out in the resolution of unforeseen tasks, but its generalization capacity far as the performance of other vision-language-action models. According to Google’s technical report, it is able to adapt to unpublished scenarios and make decisions without prior training, bringing robots closer to real autonomy.
In addition, it has been designed to function with different types of robots. Although he trained mainly with Aloha 2, a two -arms platform, he has also proven to control systems such as arms Frankaused in laboratories, and even more advanced humanoids such as Apollo, developed by Apptronik. Its flexibility makes it a model adaptable to various applications, from industry to assistance.
For now, there is no scheduled date for a generalized deployment of Gemini Robotics or Gemini Robotics-Er. Technology is still developing and, for the moment, only a small group of companies is having access to these tools.
Google Deepmind is collaborating with Apptronik in the construction of the next generation of humanoid robots, exploring how to integrate these models of AI in more advanced systems. In addition, some trusted tests, such as Agile Robots, Agility Robotics, Boston Dynamics y Enchanted Toolsthey are already testing Gemini Robotics-Er, although it is not clear if that access will be expanded in the future.
Meanwhile, Google Deepmind continues to work on new security frames and benchmarks to evaluate the possible risks of AI in physical environments. All this makes it clear that, although the project progresses, there is still a long way before this technology reaches the general public.
Images | Google Deepmind
In WorldOfSoftware | Faced with an AI that says yes to everything, a concern: this will never create an Einstein or a Newton