Red Hat Ai Inference Server Democratiza At The Generation

Red Hat AI Inference Server is the new inference business server that comes to make the vision of Red Hat of Execute any generative the AI model In any AI accelerator in any cloud environment. It has been presented at the Red Hat Summit that the company is celebrating this week accompanying the presentation of the Red Hat Enterprise Linux 10, the star of the event.

Red Hat AI Inference Server is integrated as a new offer within the Red Hat AI ecosystem, born from the powerful VLLM community project and is optimized with the integration of Magic Neural Technologies by Red Hat, offering faster speed, efficiency in the use of accelerators and profitability in the hybrid cloud. The server can be deployed independently or as an integrated component of Red Hat Enterprise Linux AI (Rhel AI) and Red Hat Onshift AI.

Red Hat ai inference server: generative hybrid cloud.

Red Hat explains that inference is the Critical Execution Motor of the AIwhere pre-straight models convert the data into practical applications. It is the key point of the user’s interaction, which demands quick and precise answers, as Joe Fernandes, vice president and general manager of the AI Business Unit on Red Hat:

«Inference is where the true promise of generative AI becomes a reality, where user interactions are responded quickly and precision thanks to a specific model, but this must be done effectively and profitably. Red Hat AI Inference Server is designed to meet the demand for high performance inference and with a scale response capacity, keeping resource needs and providing a common inference layer that admits any model, which is executed in any accelerator in any environment ».

And as the generative models become increasingly complex and the deployments in production increase, inference can become an important bottleneck, quickly consuming hardware resources and threatening to paralyze the response capacity and increase operating costs.

Robust inference servers are no longer a luxury, but a need to unlock the true potential of AI on scale and overcome the underlying complexities in an easier way. The new Red Hat server directly addresses these challenges directly, proposing an open inference solution designed for high performance and equipped with leading compression and optimization tools of models.

This innovation allows organizations to make the most of the transforming power of the generative AI by offering much more receptive user experiences and unprecedented freedom in their choice of AI accelerators, models and IT environments.

In any deployment environment, Red Hat AI Inference Server provides users with a reinforced and supported distribution of VLLM, in addition to:

Smart compression tools LLM to drastically reduce the size of both foundational and tight AI models, minimizing computer consumption and, at the same time, preserving and potentially improving the precision of the model.
An optimized models repositoryhoused in the Red Hat AI organization in Hugging Face, which offers instant access to a validated and optimized collection of models of the leaders ready for the deployment of inference, which helps to accelerate efficiency between 2 and 4 times without compromising the accuracy of the model.
Business support of Red Hat with decades of experience in bringing community projects to production environments.
Third -party support For greater flexibility of deployment, which allows Red Hat AI Inference Server to deploy on Linux and Kubernetes platforms that are not Red Hat, following the support policy of third -party Red Hat.

VLLM: Key to innovate in inference

Red Hat AI Inference Server is based on the VLLM project, industry leader, which was initiated by the University of California, Berkeley in mid -2023. This community project offers High -performance generative inferencesupport for extensive input contexts, multi-GPU acceleration of models, support for continuous batches, among others.

The broad VLLM support for publicly available models, along with its integration from the 0 of leading models such as Deepseek, Gemma, calls, flame Nemotron, Mistral, Phi, among others, as well as open reasoning models and business application as Nemotron calls, it positions it as a de facto standard for the future innovation in inference of ia. The growing acceptance of VLLM by the main suppliers of models consolidates its key role in the configuration of the future of generative AI.

The Vision of Red Hat

The future of AI «It must be defined by unlimited opportunities, And not because of the limitations imposed by infrastructure silos »says the Open Source giant, who sees a future where organizations can display any model, in any accelerator, through any cloud, offering an exceptional and more consistent user experience without exorbitant costs.

To unlock the true potential of investments in generative, companies They need a universal inference platform: «A standard for an innovation in the most fluid and high performance, both now and in the future».

Just as Red Hat was a pioneer in his open company proposal by transforming Linux into the base of modern IT, he is now prepared to design the future of AI inference. The Vllm potential is that of a central axis for the inference of the standardized generative, and Red Hat undertakes to create a prosperous ecosystem around not only to the VLLM community, but also to LLM-D for inference distributed at a scale.

The vision of Red Hat is clear: regardless of the AI model, the underlying accelerator or the implementation environment, the company intends to Turn VLLM into the final open standard For inference in the new hybrid cloud.

More information:

Red Hat Ai Inference Server Democratiza at the generation

Red Hat ai inference server: generative hybrid cloud.

VLLM: Key to innovate in inference

The Vision of Red Hat

Leave a Reply Cancel reply

Stay Connected

Latest News

From MSNBC to ‘MS NOW’: Legacy of Microsoft’s news experiment echoes in network’s new name

About the meaning of toilet meme word skibidi added to Cambridge Dictionary

Why 316 Marine Mesh Is The Best Choice For Australian Security Doors

AMD Posts Latest Linux Patches For Improving S5 Power Consumption

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Red Hat ai inference server: generative hybrid cloud.

VLLM: Key to innovate in inference

The Vision of Red Hat

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News