“Users use fewer tokens and simply work more efficiently,” he explains. “And a lot of that has to do with the ability to design prompts efficiently.”
Local AI usage
New AI hardware that generates free tokens on-premises could also partially alleviate the cost crisis.
At GTC Taipei, Nvidia and Microsoft introduced RTX Spark, a desktop PC with agent-based AI that can run agents and models with 120 billion parameters locally on Windows. The goal is to bring unlimited intelligence to every home and every Windows workstation, as Microsoft CEO Satya Nadella explained in a statement.
Some companies are also trying to reduce the cost of cloud AI by running their own hardware in data centers. Providers such as HPE and Dell supply servers that are installed in independent environments. In addition, on-premises AI is also becoming increasingly important due to demands for digital sovereignty and geopolitical risks – for example after the recent conflicts in the Middle East, in which large data centers were hit by missiles.
“There are local, region-specific and cross-vendor AI solutions,” explains Max Goss, senior director analyst at Gartner. “All of these measures can help reduce the risk. But they will not eliminate it completely.”
Use of forward-deployed engineers
The task of reducing token costs may increasingly fall to so-called forward-deployed engineers (FDEs) who work directly in customer environments, explains Taimur Rashid, managing director of the AWS Generative AI Innovation Center.
“I expect that these teams will be able to design systems that take these cost requirements into account, whether by using a different model or a different use case that does not increase the cost per token,” said Rashid.
Companies might spend large sums on token consumption, but that’s no cause for concern as long as they generate revenue and the economics are right, says the AWS manager.
Measure results – instead of token consumption
Even with the current focus on reducing token consumption to reduce costs, the metrics for assessing AI success are likely to change, Gartner analyst Deepak Seth believes. At some point, token-based pricing will shift more towards a results-based model, where actual business results, rather than word fragments, determine value.
“Some companies are already moving toward outcome-based pricing. Once companies realize the true cost of tokens, they will become more concerned with their efficiency,” predicts Gartner analyst (mb)
This article is based on one Contribution there Computerworld.
