The world is circular. Kubernetes made developers forget about infrastructure and helped enable AI. Now AI is making them remember. Hardware — which cloud relegated to the back room — is now in the spotlight again.
Industry experts discuss how AI is reshaping Kubernetes and putting hardware back in focus.
At KubeCon + CloudNativeCon this week, attendees kept facing the same conclusion when it came to Kubernetes and AI: You can’t scale generative AI, inference jobs or agentic systems with an outdated hardware stack dragging behind. AI training will be the responsibility of the few, but demand for AI inference engines will come from the masses. Inference workloads are everywhere. They’re in customer-facing apps. They’re in developer tools. They’re latency sensitive, cost sensitive and hardware hungry. How are vendors responding? Hardware-agnostic pipelines are the new grail — build once, run anywhere.
“We’re now in a place where we have to consider 400-gig networking because the models need stuff like that,” said Joep Piscaer (pictured, center), analyst at TLA Tech B.V.
Piscaer and Ned Bellavance (right), independent consultant and technical educator, spoke with theCUBE’s Rob Strechay (left) and Savannah Peterson at the KubeCon + CloudNativeCon NA event, during an exclusive broadcast on theCUBE, News Media’s livestreaming studio. They discussed the future of Kubernetes and AI and how the former is increasingly optimizing for the latter. (* Disclosure below.)
Vendors and technologists rethink stack for AI
Cloud providers, open-source projects and platform teams are all lining up to make inference workloads less stressing for infrastructure. Analysts noted Google Cloud’s Google Kubernetes Engine and other platforms are becoming hardware-agnostic, running AI on GPUs, TPUs or edge devices. Frameworks such as SynergAI show this in practice — Kubernetes orchestrating AI across heterogeneous hardware, cutting quality-of-service violations by 2.4×.
“Inference is going to happen on hardware that people are touching, but it’s going to be Kubernetes built into all this stuff intuitively,” Peterson said.
In other words, the old “throw GPUs at it” no longer cuts it. Also need are fine-grained control, hardware scheduling, network fabric and accelerator resource management. Infrastructure builders are thinking more about optimizing for AI from scratch.
If hardware is coming back, then the eyes and ears of the system have to evolve too. AI is placing heavy new demands on observability. To illustrate, Strechay related his experience at this year’s Infrastructure as Code Conference. “The last question we got was around, ‘How do I do observability for prompts?’ I think the complexity of AI and so many moving parts has everybody coming to the table because everybody’s freaked out,” he said.
Bellavance echoed this uncertainty: “We have these golden signals we normally observe for: ‘What’s my CPU utilization? What’s my response times on things?’ Now there’s a new metric we have to watch, which is the prompt and also the response. That’s going to be tough, and people need to start instrumenting for that.
Platforms such as OpenTelemetry and eBPF-powered tooling are popping up, tracking not just CPU or memory, but prompts, responses, token usage and retrieval accuracy. AI-native observability for AI systems allows in-production troubleshooting and performance optimizing.
Kubernetes adapts for AI’s second arc
AI is forcing Kubernetes to evolve. It’s no longer just about containers. Inference workloads, agentic applications and massive model deployments demand GPU scheduling, accelerator-aware orchestration and high-speed networking. The Certified Kubernetes AI Conformance Program that CNCF launched at the conference sets standards for GPU/TPU scheduling, telemetry and cluster orchestration specifically for AI workloads. Google Cloud’s GKE Pod Snapshots reduce inference startup by up to 80%.
Back in the before times, we asked: “What’s after Kubernetes?” In fact, early Kubernetes contributor Kelsey Hightower has said the platform was designed to last about 20 years — it turned 11 this year. The present question isn’t “what’s after K8s?” but “what happens with K8s when AI is the driving workload?” The answer is unfolding at KubeCon: K8s is becoming the central nervous system for the AI stack, with clusters optimized for inference, smarter scheduling, predictive scaling and advanced observability. Welcome to the second arc, according to the analysts.
Here’s the complete video interview, part of News’s and theCUBE’s coverage of the KubeCon + CloudNativeCon NA event:
(* Disclosure: TheCUBE is a paid media partner for the KubeCon + CloudNativeCon NA event. Neither Red Hat Inc., the primary sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or News.)
Photo: News
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
