Google LLC today made Gemini 2.5 Pro, an advanced large language model it debuted last month, available in public preview.
Until now, the LLM was accessible through a free application programming interface with low usage limits. Developers could only send 25 requests per day at a rate of up to five a minute. The new public preview version of Gemini 2.5 Pro brings paid tiers that offer significantly higher rate limits.
According to Google, developers can now send the model up to 2,000 requests per minute with no daily maximum. Gemini 2.5 Pro is capable of processing up to 8 million tokens’ worth of data per minute. That makes the model suitable for powering production applications with a large number of users.
For prompts with up to 200,000 tokens, Google is charging $1.25 per 1 million input tokens and $15 per 1 million output tokens. Those rates increase to $2.5 and $15, respectively, when a prompt’s token count exceeds 200,000. Google’s pricing makes Gemini 2.5 Pro more expensive than DeepSeek-R1 but cheaper than Anthropic PBC’s Claude 3.7 Sonnet.
“The experimental version of Gemini 2.5 Pro remains available for free with lower rate limits,” Google senior product manager Logan Kilpatrick wrote in a blog post today.
At the time of its initial debut last month, Gemini 2.5 Pro topped the popular LMArena LLM benchmark by a significant margin. The benchmark compares AI models’ performance based on feedback from users. Gemini 2.5 Pro also achieved a 86.7% score on AIME 2025, a qualifying exam for the U.S. Math Olympiad.
Notably, Google says the LLM outperformed several reasoning-optimized models without using “test-time techniques that increase cost.” Test-time compute is a machine learning method that boosts an LLM’s output quality by increasing the amount of time and hardware resources it invests in completing tasks. The technique can significantly increase inference costs.
Under the hood, Gemini 2.5 Pro is improved version of a model called Gemini 2.0 Pro that Google debuted in December. According to the search giant, its engineers enhanced both the base model and the post-training workflow. Post-training is the practice of improving an LLM’s output quality after it’s trained by providing it with additional data.
Google’s Gemini model series and open-source LLMs such as R1 are creating more competition for OpenAI. To maintain its market position, the ChatGPT developer will launch a pair of new reasoning models in the next two weeks. OpenAI Chief Executive Officer Sam Altman detailed the plan today in a post on X.
The company intends to release the o3 reasoning model it previewed in December, as well as a previously-unannounced LLM called o4-mini. OpenAI originally had no plans to make o3 available as a standalone service.
In a few months, the company will follow-up the two models by releasing GPT-5. It’s described as an AI system that combines the reasoning-optimized o3 model with several other features. OpenAI will use GPT-5 to power both the free and paid editions of ChatGPT.
Image: Google
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU