Intel’s software engineers working on the OpenVINO AI toolkit today released OpenVINO 2025.0 that brings support for the much talked about Deepseek models along with other large language models (LLMs), performance improvements to some of the existing model support, and other changes.
New model support with Intel’s OpenVINO 2025.0 open-source AI toolkit include Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell, and FLUX.1 Dev.
OpenVINO 2025.0 also delivers on better whisper model performance on CPUs, integrated GPUs, and discrete GPUs with OpenVINO’s GenAI API. Plus there is initial Intel NPU support for torch.compile for using the PyTorch API on Intel NPUs.
The OpenVINO 2025.0 also brings improvements for second token latency for LLMs, KV cache compression is now enabled for INT8 on CPUs, support for Core Ultra 200H “Arrow Lake H” processors, OpenVINO backend support with the Triton Inference Server, and the OpenVINO Model Server can now work natively on Windows Server deployments.
Downloads and more details on the just-released OpenVINO 2025.0 via GitHub. I’ll have new OpenVINO benchmarks and OpenVINO GenAI benchmarks soon on Phoronix.