Intel software engineers released oneDNN 3.8 to end out the week with various new performance optimizations and more.
The Intel oneDNN library that is now part of the UXL Foundation serves as the building blocks for AI / deep learning applications. This library provides basic building blocks for deep learning applications and is aggressively optimized for Intel’s hardware offerings but with time has also developed robust support for competitor hardware platforms too.
With oneDNN 3.8 there are continued Intel AMX enhancements, better Panther Lake Xe3 integrated graphics performance, refinements for existing Xe2 graphics support, and other optimizations to benefit Intel’s recent and upcoming CPU and GPU products.
“Intel Architecture Processors
– Improved matmul and inner product primitives performance on processors with Intel AMX instruction set support.
– Improved performance of convolution and inner product primitives on processors with Intel AVX2 instruction set support.
– Improved performance of int8 convolution support with zero points.
– Improved fp32 convolution performance with fp16 and bf16 compressed weights on processors with Intel AVX2 or Intel AVX-512 instruction set support.
– Improved fp16/bf16 depthwise convolution performance with fp32 bias or sum post-ops or dilation.
– Improved bf16 pooling backpropagation performance.
– Improved binary post-ops performance with per_w broadcast.Intel Graphics Products
– Improved performance on Intel Arc graphics for future Intel Core Ultra processors (code name Panther Lake).
– Improved convolution performance on:
Intel Arc Graphics for Intel Core Ultra processor series 2 (formerly Lunar Lake).
Intel Arc B-series discrete graphics (formerly Battlemage).
– Improved int8 matmul performance with zero-points support for source and weight tensors.
– Improved f4_e2m1 and f4_e3m0 matmul and reorder performance.
– Improved performance of the following subgraphs with Graph API:
Scaled Dot Product Attention (SDPA) with int4 and int8 compressed key and value.
fp16/bf16 SDPA with fp32 intermediate data types. Using fp32 intermediate data types is recommended.
SDPA with head size 512 and 576.
Grouped Query Attention (GQA) with 5D input tensors.”
The oneDNN 3.8 release also has FP16, INT8, and BF16 optimizations for AArch64 processors, Graph API support for NVIDIA GPUs, ROCm 6 support on AMD CPUs, and a variety of other smaller enhancements.
Downloads and more information on the oneDNN 3.8 library release for building out deep learning applications via GitHub. New oneDNN benchmarks soon for upcoming hardware releases.