Intel OneDNN 3.8 Brings More CPU & GPU Performance Optimizations

Intel oneDNN 3.8 Brings More CPU & GPU Performance Optimizations

Last updated: 2025/05/11 at 12:10 AM

News Room Published 11 May 2025

Intel software engineers released oneDNN 3.8 to end out the week with various new performance optimizations and more.

The Intel oneDNN library that is now part of the UXL Foundation serves as the building blocks for AI / deep learning applications. This library provides basic building blocks for deep learning applications and is aggressively optimized for Intel’s hardware offerings but with time has also developed robust support for competitor hardware platforms too.

Intel Xeon Granite Rapids

With oneDNN 3.8 there are continued Intel AMX enhancements, better Panther Lake Xe3 integrated graphics performance, refinements for existing Xe2 graphics support, and other optimizations to benefit Intel’s recent and upcoming CPU and GPU products.

“Intel Architecture Processors

– Improved matmul and inner product primitives performance on processors with Intel AMX instruction set support.
– Improved performance of convolution and inner product primitives on processors with Intel AVX2 instruction set support.
– Improved performance of int8 convolution support with zero points.
– Improved fp32 convolution performance with fp16 and bf16 compressed weights on processors with Intel AVX2 or Intel AVX-512 instruction set support.
– Improved fp16/bf16 depthwise convolution performance with fp32 bias or sum post-ops or dilation.
– Improved bf16 pooling backpropagation performance.
– Improved binary post-ops performance with per_w broadcast.

Intel Graphics Products

– Improved performance on Intel Arc graphics for future Intel Core Ultra processors (code name Panther Lake).
– Improved convolution performance on:
Intel Arc Graphics for Intel Core Ultra processor series 2 (formerly Lunar Lake).
Intel Arc B-series discrete graphics (formerly Battlemage).
– Improved int8 matmul performance with zero-points support for source and weight tensors.
– Improved f4_e2m1 and f4_e3m0 matmul and reorder performance.
– Improved performance of the following subgraphs with Graph API:
Scaled Dot Product Attention (SDPA) with int4 and int8 compressed key and value.
fp16/bf16 SDPA with fp32 intermediate data types. Using fp32 intermediate data types is recommended.
SDPA with head size 512 and 576.
Grouped Query Attention (GQA) with 5D input tensors.”

The oneDNN 3.8 release also has FP16, INT8, and BF16 optimizations for AArch64 processors, Graph API support for NVIDIA GPUs, ROCm 6 support on AMD CPUs, and a variety of other smaller enhancements.

Downloads and more information on the oneDNN 3.8 library release for building out deep learning applications via GitHub. New oneDNN benchmarks soon for upcoming hardware releases.

Intel oneDNN 3.8 Brings More CPU & GPU Performance Optimizations

Leave a Reply Cancel reply

Stay Connected

Latest News

Galaxy S26 Pro and S26 Edge leak hints at a welcome battery change

13 Best Resource Management Software in 2025 |

WWE SummerSlam 2025 LIVE RESULTS: John Cena attacked by returning Brock Lesnar

Ninety laptops, millions of dollars: US woman jailed over North Korea remote-work scam

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News