Red Hat Launches The LLM-D Community To Boost Distributed Inference Of Generative

Red Hat launches the LLM-D community to boost distributed inference of generative

Last updated: 2025/06/19 at 1:48 PM

News Room Published 19 June 2025

Red Hat has announced the launch of the LLM-D communityan Open Source project that addresses one of the needs that generative AI will have in the future: large -scale inference. With inference technologies for large-scale generative AI, LLM-D has a narive architecture of kubernetes, distributed inference based on VLLM and an intelligent network routing with AI conscience. In this way, the inference clouds of the LLM can meet the objectives of the production service level.

The LLM-D community allows Red Hat and its partners to amplify the power of VLLM to overcome the limitations of a single server, as well as unlock production at scale for the inference of AI. Thanks to the orchestration of kubernettes, LLM-D integrates advanced inference capabilities in existing business IT infrastructure. This unified platform allows IT teams to respond to service demands of critical business loads for the business.

VLLM, which has become the standard open -source de facto inference server, offers models support from the day or for emerging avant -garde models, as well as support for various accelerators, among which are the tensor processing units (TPU) of Google Cloud.

LLM-D also offers prefill and decode disaggregation to separate the input context phases and generation of AI tokens in discrete operations, where they can then be distributed among several servers. In addition, it has KV cache download (Key-Value), based on LMCache, which passes the KV cache memory load from the GPU memory to standard storage, such as the memory of the CPU or network storage.

It has clusters and controllers based on kubernetes for a more efficient programming of computing and storage resources as the demands vary, maintaining the performance and with less latency. Also with a network conscious network, for program incoming requests to servers and accelerators who have more probabilidates of having “hot” caches of previous inference calculations.

Apart from this, the community includes high -performance communication APIS for faster and more efficient data transfer, with support for NVIDIA Inference Xfer Library (Nixl).

This project already has the Support from various generative AI models, AI accelerator developers and AI cloud platforms. Among its fundamental collaborators are Coreweave, Google Cloud, IBM Research and Nvidia. Among their partners are AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI.

The LLM-D community also has the support of the founders of the Sky Computing Lab of the University of California, of the Creators of VLLM, and the LMCache Lab of the University of Chicago, architects of LMCache.

Red Hat launches the LLM-D community to boost distributed inference of generative

Leave a Reply Cancel reply

Stay Connected

Latest News

Lossless audio finally arrives on Spotify Premium – here’s how you can get it today | Stuff

Cracking the Constrained Number Partitioning Problem (CNP) with SPIMs | HackerNoon

By popular demand, the Kobo Clara Colour gets a fresh white finish

Lookout rolls out Smishing AI to stop social engineering on mobile devices – News

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News