Google LLC today debuted Gemini 3.1 Flash-Lite, the latest addition to its Gemini series of multimodal artificial intelligence models.
The company’s engineers developed the algorithm with cost-efficiency in mind. Gemini 3.1 Pro, Google’s most capable model, starts at $2 per million input tokens and $18 per million output tokens. Those rates increase significantly for demanding workloads. Gemini 3.1 Flash-Lite is priced $0.25 per million input tokens, while generating a million output tokens costs $1.50.
Google says that the algorithm is also faster than other Gemini models. In an internal test, the company compared it against Gemini 2.5 Flash, an earlier AI that is likewise optimized for cost-efficiency. Gemini 3.1 Flash-Lite’s overall answer generation speed is 45% higher, while the amount of time that users must wait until the first output token is 2.5 times shorter.
The model can process multimodal prompts with up to 1 million tokens worth of data. It generates responses with up to 64,000 tokens of text. That text can include software code, which enables Gemini 3.1 Flash-Lite to generate code-based visual assets such as business intelligence dashboards.
Google ran 11 benchmark tests to evaluate the model’s output quality. Gemini 3.1 Flash-Lite achieved the top score across six of the tests, besting GPT-5 mini and Anthropic PBC’s Claude 4.5 Haiku. One of the benchmarks that the model completed more accurately is GPAQ Diamond, which contains nearly 200 doctorate-level science questions.
The model achieved a 16% score on HLA, one of the world’s most difficult AI benchmarks. Google’s top-end Gemini 3.1 Pro scored 44.4%.
The company sees developers using Gemini 3.1 Flash-Lite for high-volume tasks that don’t require extensive reasoning capabilities. An e-commerce marketplace operator, for example, could use it to translate third-party product listings and block items that breach its terms of service.
The model also lends itself to certain other tasks. A demo video posted by Google shows a developer using Gemini 3.1 Flash-Lite to generate a weather tracking dashboard with natural language prompts. In another demo, the model added hundreds of illustrative product listings to an e-commerce website prototype.
The new model is based on Gemini 3 Pro, which was until recently Google’s flagship reasoning model. The latter algorithm features a mixture-of-experts architecture, which means that it only activates some of its parameters to answer prompts. That approach helps reduce inference costs.
Gemini 3.1 Flash-Lite is available in preview through Google Cloud’s Vertex AI suite of AI services. It’s also accessible via the Google AI Studio code generation tool, which enables developers to build simple applications with natural language prompts.
Image: Google
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
