Intel’s New LLM-Scaler Beta Update Brings Whisper Model & GLM-4.5-Air Support

Last updated: 2025/08/22 at 6:12 AM

News Room Published 22 August 2025

Earlier this month Intel released LLM-Scaler 1.0 as part of their Project Battlematrix initiative. This is a Docker container effort to deliver speedy AI inference performance with multi-GPU scaling and PCIe P2P support and more.

While there was the v1.0 announcement earlier this month, yesterday Intel software engineers released “0.9.0-b3” as a new beta release for the llm-scaler-vllm Docker build.

The updated LLM-Scaler vLLM beta enables Whisper model support, GLM-4.5-Air support, enables GLM-4.1V-9B-Thinking for image input, and enables the dots.ocr model. On top of supporting the additional models, yesterday’s beta also optimized vLLM memory usage and enables the pipeline parallelism Ray back-end.