Red Hat has announced a new version of its AI platform for companies, Red Hat AI 3. The platform, which integrates the latest innovations in AI Inference Server, Enterprise Linux AI (RHEL AI) and OpenShift AI. The platform will deliver greater simplicity in high-performance AI inference at scale, making it easier to move workloads from proof of concept to production. It also improves collaboration around AI-enabled applications.
Additionally, Red Hat AI 3 enables more agile scaling and delivery of AI workloads across hybrid and multi-vendor environments, as well as improved cross-team collaboration on next-generation AI workloads, such as agents, on the same common platform. Red Hat AI 3 supports any model on any hardware accelerator, from data centers to public cloud and sovereign AI environments.
The new version of the platform has evolved into a scalable and cost-effective inference offering, based on the vLLM and llm-d projects, as well as Red Hat’s model optimization capabilities, with the aim of offering a production-quality LLM service.
Red Hat OpenShift AI 3.0 introduces general availability of llm-d, which enables intelligent distributed inference, leveraging the performance value of Kubernetes and vLLM, combined with open source technologies such as the Kubernetes Gateway API Inference Extension, the NVIDIA Dynamo Low Latency Data Transfer Library (NIXL), and the DeepEP Mixture of Experts (MoE) communication library.
Thanks to this, organizations can reduce costs and improve response times through intelligent, inference-optimized model programming, with a disaggregated service. They can also provide operational simplicity and reliability through prescriptive “Well-Lit Paths” that expedite the deployment of large-scale models in Kubernetes. In addition, thanks to their cross-platform support for deploying LLM inference on various hardware accelerators, they provide more flexibility.
llm-d is based on vLLM, and transforms it into a distributed, consistent and scalable service system. It is tightly integrated with Kubernetes and designed to enable predictable performance, measurable ROI, and effective infrastructure planning. These enhancements address the challenges created by managing LLM workloads and serving massive models, such as Mixture-of-Experts (MoE) models.
Red Hat AI 3, improvements in productivity and efficiency
Among the new platform features developed to enable improvements in productivity and efficiency are Model as a Service (MaaS) capabilities, based on distributed inference. They allow IT teams to act as their own MaaS providers, serving common models centrally. Provides on-demand access to AI developers and applications, improving cost management and supporting use cases that cannot run on public AI services due to data or privacy concerns.
The AI Hub enables platform engineers to explore, deploy and manage foundational AI assets. It offers a central hub with a curated catalog of models, including validated and optimized generative AI, as well as a registry to manage the lifecycle of the models and a deployment environment to configure and monitor AI assets running on OpenShift AI.
Gen AI studio provides an environment for AI engineers to interact with models and prototype generative AI applications. It has an AI asset endpoint feature, which allows you to discover and consume available models and MCP servers, designed to streamline the interaction of models with external tools. The integrated playground provides an interactive, session-independent environment for experimenting with models, testing prompts, and tuning parameters for use cases such as chat and recovery augmented generation (RAG).
In addition, it offers new models validated and optimized by Red Hat to simplify development. Among them are popular open source models, such as OpenAI’s gpt-oss, DeepSeek-R1; and other specialized ones, such as Whisper, for converting speech to text, and Voxtral Mini, for voice-enabled agents.
The company has added a unified API layer based on the Llama Stack that facilitates development by aligning it with industry standards, such as OpenAI-compatible LLM interface protocols. Additionally, it has adopted the Model Context Protocol (MCP), a standard that streamlines the way AI models interact with external tools.
Red Hat AI 3 incorporates a new modular and extensible model customization toolkit. This kit builds on the existing InstructLab functionality, and offers specialized Python libraries that give developers more flexibility and control.
The toolkit is powered by open source projects, such as Docling, for data processing. This speeds up the ingestion of unstructured documents and their conversion into an AI-readable format. Besides, includes a flexible framework for generating synthetic data and a training hub for fine-tuning of LLM. As for the evaluation hubhelps AI engineers monitor and validate results, so they can leverage their proprietary data to improve the accuracy and relevance of the AI results they obtain.
