Atul Deo’s goal is to make artificial-intelligence software both cheaper and smarter at the same time.
The up-and-coming executive, head of Amazon Bedrock and generative artificial intelligence for Amazon Web Services Inc., has only six months to show it can be done before the company’s blockbuster annual re:Invent conference in December.
“The AI space is moving faster than anything I’ve seen,” said Deo (pictured). “Models get better every few weeks — but customers won’t deploy them unless the economics pencil out.”
That tension between technological leap and production cost is now redefining the cloud industry’s next phase. For AWS, it means selling not only the most accurate models but also the plumbing that stops AI bills from spiraling.
For corporate chief information officers, it means shifting from flashy chatbots to “agentic” software that can execute multistep tasks and justify its price tag. Together those forces are turning Bedrock — the two-year-old service that hosts third-party and Amazon-built AI models — into one of the most closely watched products inside the $100-billion-a-year unit formerly steered by Amazon.com Inc. Chief Executive Andy Jassy.
The video below is part of our editorial series AWS and Ecosystem Leaders Halftime to Re:Invent Special Report digital event. Look for other articles from the event on News.
The model deluge
Since January, Bedrock has added seven headline models: Anthropic’s Claude Sonnet 4 and Opus 4, Meta’s open-source Llama 4, Chinese upstart DeepSeek and three versions of Amazon’s own Nova family, including Nova Premier, which Deo touts as “state-of-the-art accuracy at a discount.” Each drop lands with predictable fanfare on social media. Less visible is the licensing dance that lets AWS customers swap among them with a single application programming interface.
Too many model choices are like interest rates: Everyone has an opinion and they change overnight. “Our job is to give customers the spread and let them pick,” Deo said. “We’ve seen a lot of capable models. Sonnet 4 and Opus 4 are really powerful. We launched Nova Premier, which comes with remarkably good price-performance.”
That spread now includes a Bedrock Marketplace, an “app store” for niche models: one used by pharmaceutical chemists, another tuned for call-center transcripts. Analysts see echoes of Amazon’s successful RDS database strategy, which lets clients run Oracle, PostgreSQL or the company’s home-grown Aurora engine on the same billing sheet.
Engineering out the bill
But choice alone doesn’t fix the line item that most frustrates finance chiefs: inference cost, or the price of running a model each time it produces an answer. Here Deo rattled off new levers to pull:
- Prompt caching — store the long, instruction-heavy part of a query so it isn’t rebilled. AWS says customers save up to 90%.
- Intelligent prompt routing — send easy questions to a cheaper, faster model and harder ones to a heavyweight, all in real time.
- Batch mode — process millions of requests overnight at half the per-unit rate of real-time calls.
- Model distillation — transfer knowledge from a giant model into a slimmed-down one specialized for a single workflow.
Taken together, the features aim to break what cloud skeptics call the “token treadmill,” a reference to the basic unit of text that AI systems consume. More tokens mean more compute cycles; more compute cycles mean a bigger bill. It’s no longer about “first-token latency.” Now companies want to know the total cost and time it takes to get an answer or task completed. This is where Deo and Bedrock’s whole roadmap is focused on.
From chatbot to agent
Cost discipline is also the prerequisite for AI’s next act: autonomous agents that perform tasks spanning minutes, hours or even days. Early experiments are already live. A mortgage startup now uses Bedrock agents to collect documents, scan them for errors and shepherd borrowers through underwriting “in days instead of weeks,” Deo said. Real-estate firms are shrinking property-sale timelines from three months to a fortnight by delegating diligence chores to similar bots.
What changed? Two ingredients arrived simultaneously. First, bigger models — Claude Opus 4, DeepSeek — learned to “think out loud,” iterating on their own answers rather than returning a single best guess. Second, AWS rolled out multi-agent collaboration, a Bedrock feature that splits a business process among specialized bots powered by different models. One agent might use Anthropic for deep reasoning, another Nova Lite for high-volume form checks, and a third a vertical model devoted to real-estate law.
“Think of it as a project team,” Mr. Deo says. “HR, finance, engineering — each agent has a role.”
The hardware moat
None of this works if cloud providers choke on their own demand. AI clusters require tens of thousands of high-end chips and the electricity to match. According to Deo, this is where Amazon brings its silicon portfolio to the table. Graviton-based CPUs for conventional workloads and Trainium 2 accelerators tailored for AI training and inference. Bedrock’s newest Nova models were trained entirely on Trainium 2 hardware, Deo said, a milestone that reduces Amazon’s reliance on Nvidia’s scarce GPUs.
“Custom silicon is how we bend the curve,” he added. “It’s the reason we can drop price while pushing capability.”
Rival Microsoft has announced a Maia AI chip; Google Cloud has TPUs. Yet AWS continues to spend the most billions in data-center investments, according to analyst firm Canalys.
Model Context Protocol: ‘USB-C for AI’
Another emerging pillar is technical but potentially transformative: the Model Context Protocol, or MCP. Deo calls it “USB-C for AI,” a standard that lets agents discover data sources and each other dynamically, maintain state across calls, and enforce security policies. AWS has quietly released MCP server implementations for popular services such as S3 storage and DynamoDB databases.
“If you want agents talking to payroll one minute and Salesforce the next without hard-coding APIs, MCP is the handshake,” said Dave Vellante, chief analyst at theCUBE Research, News’s sister market research firm.
Guardrails for regulated industries
As agents inch toward healthcare records and loan approvals, enterprises want proof that models won’t hallucinate. Bedrock’s answer is Automated Reasoning, a feature that runs an independent verifier — essentially a logical proof engine — against each response. If the verifier can’t reach 99.9% confidence, it flags the answer for human review.
The technique borrows from AWS’ own security tools such as IAM Access Analyzer. “We’ve used formal methods to validate permissions for years,” Deo said. Now AWS is applying them to language.
Pharmaceutical and banking CIOs like the concept, said JPMorgan Chase Chief Information Officer Lori Beer, who spoke with me at AWS’ re:Invent conference last year. “Gen AI is just another app to us — but its bar for cyber resilience is sky-high,” she said.
Observability: The next frontier
Even with proofs in place, companies must audit who — or what — did what, and when. Traditional application-performance monitoring stops at the API call; agents require X-ray vision across an entire pipeline. AWS logs every Bedrock prompt and response in CloudTrail, but Deo concedes that is only a start. “We’ll need agent evaluation, lineage tracing, rollback tools — the equivalent of Git history for autonomous workflows,” he said.
Observers expect new services before re:Invent that will visualize agent flows and flag drift in accuracy or compliance.
A three-layer stack
Deo’s team pitches Bedrock as the middle layer of a three-tier strategy:
- Infrastructure – custom chips (Trainium, Graviton) and Amazon SageMaker for customers who want to build or surgically fine-tune their own models.
- Bedrock platform – off-the-shelf and third-party models, plus tooling such as prompt caching and multi-agent collaboration.
- Applications – fully managed software like Q Developer and Q Business, which let coders and business analysts write queries in plain English.
The goal: Let a hedge-fund quant bury herself in SageMaker while a nontechnical insurance adjuster drags a file into Q Business and gets an instant claims report — both underpinned by the same Bedrock primitives.
The one-person unicorn
Perhaps the most radical implication of the new stack is what Deo calls the solo-founder unicorn. “You’re going to have multibillion-dollar companies powered by a single individual — it’s a matter of when, not if,” he said. The tools now abstract away infrastructure, coding syntax and even business-process wiring.
That prospect thrills venture capitalists and unnerves incumbents. It also explains Amazon’s urgency: Every month Bedrock delays a feature is a month a garage startup might pick a different cloud.
Sprinting at scale
Can AWS keep sprinting while carrying the profit load of a trillion-dollar parent? The six-month cloud results — seven models, four cost-savers, two new protocols — suggests it can. Yet Google and Microsoft will answer with their own price cuts and agent toolkits. And regulators from Europe to Washington are trying to regulate and understand the AI supply chain.
Deo circles back to the common theme at AWS. “Speed is our advantage,” he says, echoing AWS Chief Executive Matt Garman’s mantra. “We have to deliver hardware, cost controls, guardrails and creativity faster than customers’ imagination.”
In other words, the world’s biggest cloud must behave like a startup — while running data centers the size of small cities. Atul Deo flips his notebook closed; another model launch is due next week.
Observers expect new services before re:Invent that will visualize agent flows and flag drift in accuracy or compliance.
Here’s the full interview with Deo:
Photo: News
Support our open free content by sharing and engaging with our content and community.
Join theCUBE Alumni Trust Network
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
11.4k+
CUBE Alumni Network
C-level and Technical
Domain Experts
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.
News Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of News, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — News Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.