Google has presented the new version of its range of Gemini Artificial Intelligence models, which it has provided with improvements in the field of multimodality, such as the native output of images and sound, and the native use of tools that allow the development of agent experiences. of AI capable of planning, remembering and acting based on the indications provided by the user of the system. Gemini 2.0, therefore, represents the company’s definitive entry into the world of Artificial Intelligence agents.
The first model in the Gemini 2.0 family, now available, is an experimental version of Gemini 2.0 Flash, a reference model focused on improved performance and low latency. Developers who want to use it can do so through the Gemini API in Google AI Studio and Vertex AI.
Additionally, Gemini users can access a chat-optimized experimental version of Flash Experimental 2.0. To do this, simply choose your option from the drop-down menu of models that appears on your desktop and mobile website. Gemini Advanced users will also be able to access Deep Research, a type of research assistant that uses AI to explore complex topics based on user requests, and then provides the results of these requests in detail in a report. .
On the other hand, Google is bringing the advanced reasoning capabilities of Gemini 2.0 to Views Created with AI, with the aim of tackling more complex topics and making it easier to answer multi-step questions. These types of questions include advanced mathematical equations, multimodal queries, and those related to programming.
Prototypes and research experiments: Astra, Mariner, Jules and other agents
Google wanted to demonstrate how agent experiences can work safely and in practical conditions. To this end, it has developed several prototypes and research experiments, which it has made available to the company’s community of trusted testers.
The first is Project Astra, a research prototype exploring the capabilities of a universal AI assistant, which has improved with version 2.0 and is now available to testers. With Gemini 2.0, you can also use Google search, Lens and Maps; and has a greater capacity to remember things without losing control.
You now have up to 10 minutes of memory during the session and can remember more previous sessions in the past, so you have improved personalization. With its new streaming features and native audio understanding, the agent is able to understand language with a latency similar to that of a human conversation.
As for Project Mariner, it is the first designed with Gemini 2.0 that addresses the future of interaction between humans and agents, starting with its browser. It is also available in trials for a select group of testers. It is able to understand and reason through the information that appears on the browser screen, including pixels and web elements such as text, code, images and forms. With this information, it uses an experimental Chrome extension to complete tasks for the user.
To make it safe and only allow responsible use, Google identified new types of risks that may arise with its use, and developed new measures to mitigate them. Thus, Project Mariner can only type, scroll or click on the active browser screen, and asks its users for final confirmation before performing certain actions, such as a purchase.
A third experimental AI-powered Google agent intended for developers and generated through Gemini 2.0, Jules, integrates directly into a GitHub workflow. Available to testers, you can solve a problem or make a plan or execute it. Of course, always under the direction and supervision of a human. This is a step in Google’s intention to develop AI agents that are useful in all possible areas. Among them, in development.
In addition to these three agents, Google has created others with Gemini 2.0 designed to facilitate navigation through the virtual worlds of video games, as well as others that can help by applying the spatial reasoning capabilities of this new version of Gemini to robotics.