Google LLC today rolled out Gemini 2.5 Flash in preview through its developer platforms so that artificial intelligence engineers and users can get a head start with the AI model.
Gemini 2.5 Flash builds on the foundation of 2.0 Flash, the company’s existing low-latency, high-performance model designed to power AI agents. The Google said the new model has enhanced reasoning capabilities and is a “thinking” model, meaning it can break down complex tasks into step-by-step plans before responding.
The new model is available starting today via the Gemini application programming interface on Google AI Studio and on Vertex AI, Google Cloud’s fully managed machine learning platform for building, training and deploying AI models.
“Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off,” Google said in the announcement. “The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost and latency.”
Google stressed that the company is aware that the thinking capability consumes tokens, the units used for processing information, which can increase time and cost. To give developers flexibility in how the model operates, Google gives developers the ability to cap the maximum number of tokens the model will spend thinking. A higher budget will improve quality, but slow it down; a smaller budget will cause it to move faster.
The model is also trained to automatically set a budget based on the complexity of a given prompt. For example, a simple question such as “How do you say, ‘Thank you” in Spanish,” or “How many provinces does Canada have?” don’t require much reasoning as they probably exist in the model’s general training or can be discovered in one step after an internet search.
Medium-level reasoning might involve tasks such as asking the model to build a daily schedule for a user based on a set of calendar events or determining the probability of a pair of dice. High-level reasoning would be asking the AI to code an entire function in Python that computes complex math. Some users have asked Gemini to help them code entire web games before, to mixed results.
Google said setting the thinking budget to 0 will result in the lowest cost and latency.
Input tokens for Gemini 2.5 Flash cost 15 cents per million input tokens and 60 cents per million output tokens without reasoning. With thinking active, the cost goes up to $3.50 per million tokens.
According to Google, 2.5 Flash has proven to be a significant upgrade over 2.0 Flash, especially in its reasoning capability. With reasoning active, its ability to break down complex tasks that require multiple steps, such as solving mathematical problems and research questions, has been greatly enhanced.
Gemini 2.5 Flash scored 12.1% on Humanity’s Last Exam compared with 2.0 Flash at 5.1%. This benchmark is designed to test AI systems using the most challenging questions humans can create in fields such as mathematics, humanities and natural sciences.
Google said 2.5 Flash continues to be the lead model with the best price-per-performance in the market. It also performs strongly on Hard Prompts in LMArena, the chatbot evaluation leaderboard, second only to Gemini 2.5 Pro, released last month.
Image: Google
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU