Announced last December, the Gemini 2.0 family of models now has a new member, Gemini 2.0 Flash-Lite, which Google says is cost-optimized for large scale text output use cases and is now available in preview. Along with Flash-Lite, Google also announced Gemini 2.0 Pro.
Gemini 2.0 Flash-Lite is a new model with the same speed and cost as 1.5 Flash while providing better quality, says Google, with the same 1 million context window as 2.0 Flash. Compared with 2.0 Flash, 2.0 Flash-Lite does not support image or audio output. Additionally, it does not support “search as a tool” or “code execution as a tool”, two “grounding” techniques aimed at improving the model’s answers by using Google Search or code execution as a tool to help check their correctness. As a final limitation, 2.0 Flash-Lite cannot be used via the Multimodal Live API, which aims to enable natural, human-like voice conversation through low-latency bidirectional voice and audio interactions.
On the performance side, 2.0 Flash-Lite significantly beats 1.5 Flash on the SimpleQA benchmark, which tests factual world knowledge, and BirdSQL, which evaluates natural language conversion to SQL. However, it performs slightly worse than 1.5 Flash on a few benchmarks, including MRCR, which evaluates long-context understanding, and LiveCodeBench, which tests Python coding. Interestingly, moreover, 2.0 Flash-Lite is comparable to or better than 1.5 Pro on several benchmarks, including Bird-SQL, FACTS Grounding, MATH, and MMMU.
Along with 2.0 Flash and 2.0 Flash-Lite, Google has also released 2.0 Pro as an experimental model. According to Google, 2.0 Pro is their best model yet for coding performance and complex prompts. Indeed, 2.0 Pro turns out to be the best Gemini model to date on most benchmarks, particularly on the SimpleQA benchmark, where it improves the second-best, 2.0 Flash, by 50%. Exceptions are the Facts benchmark, where 2.0 Flahs excels, and Long-context, where 1.5 Pro performs slightly better. Being still experimental, 2.0 Pro’s results could change before its general availability.
As a final note about Gemini 2.0 models, Google recently released Gemini 2.0 Flash Thinking in experimental mode. This model follows the recent AI-reasoning models trend that aims to create models able to break down a prompt into a sequence of smaller tasks and devise a strategy to solve them individually by taking into account their relationships, while also being able to explain their “thought” process.
Google’s announcement raised some negative comments on Reddit for the “pretty mediocre” improvement the new models provide in comparison with Gemini 1.5. While this is uncontroversially true when comparing the models based on benchmarks, the actual performance they provide in real use cases is significantly better according to others.
Additionally, the 1- and 2-million token context windows provided by Gemini 2.0 Flash and Pro led some Hacker News users to posit that they could make RAG-techniques superfluous in many use cases. On the positive side, Gemini 2.0 models seem to deliver on their promise to handle such large context, but cost concerns and decreasing performance with context length might still suggest using RAG instead.
Slightly tongue in cheek, according to ChatGPT at the time of this writing, Gemini 2.0 model and GPT-4 are comparable in performance, with Gemini 2.0 leading in text-based understanding, code generation, and multimodal integration and GPT-4 maintaining an edge in commonsense reasoning tasks.
Developers can use all Gemini 2.0 models in Google AI Studio and Vertex AI.