Earlier this month Intel released LLM-Scaler 1.0 as part of their Project Battlematrix initiative. This is a Docker container effort to deliver speedy AI inference performance with multi-GPU scaling and PCIe P2P support and more.
While there was the v1.0 announcement earlier this month, yesterday Intel software engineers released “0.9.0-b3” as a new beta release for the llm-scaler-vllm Docker build.
The updated LLM-Scaler vLLM beta enables Whisper model support, GLM-4.5-Air support, enables GLM-4.1V-9B-Thinking for image input, and enables the dots.ocr model. On top of supporting the additional models, yesterday’s beta also optimized vLLM memory usage and enables the pipeline parallelism Ray back-end.
Downloads and more details on the new Intel LLM-Scaler vLLM release via GitHub.