By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Google Stax Aims to Make AI Model Evaluation Accessible for Developers
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Google Stax Aims to Make AI Model Evaluation Accessible for Developers
News

Google Stax Aims to Make AI Model Evaluation Accessible for Developers

News Room
Last updated: 2025/09/29 at 6:34 PM
News Room Published 29 September 2025
Share
Google Stax Aims to Make AI Model Evaluation Accessible for Developers
SHARE

Google Stax is a framework designed to replace subjective evaluations of AI models with an objective, data-driven, and repeatable process for measuring model output quality. Google says this will allow AI developers to tailor the evaluation process to their specific use cases rather than relying on generic benchmarks.

According to Google, evaluation is key to selecting the right model for a given solution by comparing quality, latency, and cost. It is also essential for assessing how effective prompt engineering and fine-tuning efforts actually are in improving results. Another area where repeatable benchmarks are valuable is agent orchestration, where they help ensure that agents and other components work reliably together.

Stax provides data and tools to build benchmarks that combine human judgement and automated evaluators. Developers can import production-ready datasets or create their own, either by uploading existing data or by using LLMs to generate synthetic datasets. Likewise, Stax includes a suite of default evaluators for common metrics such as verbosity and summarization, while allowing the creation of custom evaluators for more specific or fine-grained criteria.

A custom evaluator can be created in a few steps, beginning with selecting the base LLM that will act as a judge. The judge is provided with a prompt instructing how to evaluate the tested model’s output. The prompt must contains definitions of the categories the judge will use for grading, each associated with a numerical score between 0.0 and 1.0. Additionally, it must include instructions on the preferred response format and may use variables to refer to the {{output}}, {{input}}, {{history}}, {{expected_output}}, and {{metadata.key}}. To ensure the evaluator’s reliability, it should be calibrated against trusted human ratings using a classical supervised learning approach. The evaluator prompt can then be fine-tuned through an iterative process to improve consistency between its ratings and those of the trusted evaluator.

Google Stax is not the only solution available for AI model evaluation. Its competitors include OpenAI Evals, DeepEval, MLFlow LLM Evaluate, and many others, each differing significantly in approach and capabilities.

Currently, Stax supports benchmarking for a growing list of model providers, including OpenAI, Anthropic, Mistral, Grok, DeepSeek, and Google itself. In addition, it can be used with custom model endpoints. It is free to use while in beta, but Google says it may introduce a pricing model after that.

A final note on data privacy: Google states that it will neither own user data, including prompts, custom datasets, or evaluators, nor use it to train its language models. However, users should be aware that when using other providers, those providers’ data policies will also apply.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Bad Bunny confirmed as Apple Music Super Bowl LX Halftime Show performer – 9to5Mac Bad Bunny confirmed as Apple Music Super Bowl LX Halftime Show performer – 9to5Mac
Next Article I finally deep cleaned my keyboard and it was totally worth it I finally deep cleaned my keyboard and it was totally worth it
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Seattle startup Scowtt raises M to turn CRM data into better ad campaigns
Seattle startup Scowtt raises $12M to turn CRM data into better ad campaigns
Computing
Photopea Review: A Free, Browser-Based Photoshop Substitute
Photopea Review: A Free, Browser-Based Photoshop Substitute
News
Early Benchmarks Of Linux 6.19 Git Raising Some Concerns
Early Benchmarks Of Linux 6.19 Git Raising Some Concerns
Computing
Give your PC a total refresh for just  with this Microsoft upgrade
Give your PC a total refresh for just $10 with this Microsoft upgrade
News

You Might also Like

Photopea Review: A Free, Browser-Based Photoshop Substitute
News

Photopea Review: A Free, Browser-Based Photoshop Substitute

3 Min Read
Give your PC a total refresh for just  with this Microsoft upgrade
News

Give your PC a total refresh for just $10 with this Microsoft upgrade

3 Min Read
Apple TV dominates Golden Globes nominations across major categories
News

Apple TV dominates Golden Globes nominations across major categories

3 Min Read
OpenAI says it’s disabled ad-like app promotions in ChatGPT
News

OpenAI says it’s disabled ad-like app promotions in ChatGPT

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?