Google has just presented Gemini 2, the latest version of its proprietary AI model. The company heavily emphasizes that this new iteration marks the start of what it calls the “agentive era,” where systems like this will increasingly be able to accomplish many tasks without requiring a human does not need to intervene.
As a reminder, Gemini, the first of its name, was launched a little over a year ago, with the objective of positioning the Mountain View firm among the leaders of this high-potential industry. After timid beginnings marked by memorable blunders, this AI assistant has gradually established itself as one of the most efficient on the market, in particular thanks to its advanced multimodality – its ability to work with almost all types of media, beyond simple textual content.
Google claims that version 2 is significantly more efficient than the previous one, across the board. The firm promises “increased performance for consistently fast response times”, both for the base model and for the Flash version – a lighter and therefore slightly less powerful version, but considerably faster.
Like OpenAI with its GPT-4o, Google also highlights “advanced reasoning capabilities”, in particular thanks to an even more extended context window. In practice, this term refers to the amount of information that the model can take into account to generate responses during a conversation or to accomplish any other task. This means that the new model should be significantly better at identifying logical relationships between different pieces of information, and therefore providing a complete, coherent and nuanced answer.
The Age of Agentic AI
But Gemini 2.0 doesn’t just perform better on tasks that its predecessor was already capable of accomplishing. It also arrives with new features such as native generation of sound and images, and Google intends to put this multimodality at the service of what the industry calls artificial intelligence “agents” – AI systems capable of acting proactively to make life easier for the user by anticipating their requests and needs.
Subscribe to WorldOfSoftware
With these new capabilities, Gemini will be increasingly integrated into Google’s software ecosystem. The model will play an increasingly important role in the famous search engine as well as on the YouTube platform and within the Android operating system.
To give us a taste of this new “agentic” era, Google presented the new features of its Astra Projectan embryonic “universal AI agent” that can analyze the video stream of a smartphone in real time, hold conversations in lots of different languages, or even interpret data from Google Maps.
At the same time, Google also revealed several new tools which all adhere to this agentic philosophy. We can cite Project Mariner, a prototype extension for the Chrome browser which assists the user during navigation by analyzing information on the screen, as Microsoft’s Copilot already does.
In a video, the company also revealed a collaboration with the publisher Supercell (Clash of Clans, Clash Royale, etc.) through a demo where one of these “agents” showed itself capable of understanding the gameplay sequences displayed on the screen to offer real-time suggestions.
It will be interesting to see if the public will adhere to this new paradigm, the objective of which is clearly to place these AI systems at the center of our digital lives.
🟣 To not miss any news on the WorldOfSoftware, , .