Intel today released the LLM-Scaler-vLLM 1.3 update with expanding the array of large language models that can run on Intel Arc Battlemage graphics cards with this Docker-based stack for deploying vLLM.
The new Intel llm-scaler-vllm 1.3 release via Docker and GitHub adds support for eight new models on capable Intel Arc Graphics hardware: Qwen3-Next-80B-A3B-Instruct, Qwen3-Next-80B-A3B-Thinking, InternVL3.5-30B-A3B, DeepSeek-OCR,PaddleOCR-VL, Seed-OSS-36B-Instruct, Qwen3-30B-A3B-Instruct-2507 and openai/whisper-large-v3.
In addition to those models, there is support for PaddleOCR models and GLM-4.6v-Flash support noted separately. There is also now sym_int4 support now for Qwen3-30B-A3B on TP 4/8 and Qwen3-235B-A22B on TP 16.
The LLM-Scaler-vLLM stack has upgraded against vLLM 0.11.1 and PyTorch 2.9. With the vLLM upgrade they have also enabled CPU KV cache offload, speculative decoding support with two more methods, experimental FP8 KV cache, and other enhancements.
Plus there are more bug fixes and other improvements with Intel LLM-Scaler-vLLM 1.3. Downloads and all the details via GitHub.
