By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference
Computing

PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference

News Room
Last updated: 2025/08/06 at 9:29 PM
News Room Published 6 August 2025
Share
SHARE

PyTorch 2.8 released today as the newest feature update to this widely-used machine learning library that has become a crucial piece for deep learning and other AI usage. There are a few interesting changes worth highlighting with the new PyTorch 2.8 release.

Piquing my interest with PyTorch 2.8 is improved Intel CPU performance. In particular, a focus on high performance quantized large language model (LLM) inference for Intel CPUs using the native PyTorch version. The change outlines the LLM quantization work done by Intel engineers to enhance their x86_64 CPU performance with native PyTorch. A16W8, DA8W8 and A16W4 are among the supported modes. That issue ticket noted:

“With this feature, the performance with PyTorch native stack can reach the same level or even better in some cases as comparing with popular LLM serving frameworks like vLLM when running offline mode on a single x86_64 CPU device, which enables PyTorch users to run LLM quantization with native experience and good performance.”

There have been a lot of Intel CPU commits this cycle such as for FP8 QCONV, FP8 QLINEAR, and using AMX-based micro-kernels in more instances. The AMX micro-kernel improvement can be quite beneficial:

“GEMM templates for INT4 weights are used for lowering `aten._weight_int4pack_mm_for_cpu` with Inductor when max-autotune is on. Currently, AMX-based microkernels are used only when M >= 16 if input tensor has shape [M, K]. However, we find that AMX kernel brings performance benefit when 4 < M < 16. For example, on a 6th gen of Intel(R) Xeon(R) platform, E2E latency can be improved by up to > 20% when running Llama-3.1-8B on 32 cores for M = 8. So, this PR changes the threshold so that AMX is used when M > 4.”

Too bad though that my AvenueCity reference server remains non-operational and thus unable to test the newest PyTorch release (and other Intel open-source improvements in recent months) on the flagship Xeon 6980P Granite Rapids processors… So, unfortunately, no new Xeon 6900P benchmarks at this time on Phoronix.

Intel Xeon 6980P

Also on the Intel side for PyTorch 2.8 is experimental support for the Intel XCCL GPU distributed back-end. XCCL is a distributed back-end for Intel discrete GPUs for various distributed training paradigms.

PyTorch 2.8 also brings SYCL support to the PyTorch CPP Extension API, A16W4 support for XPU devices, experimental wheel variant support, and other enhancements.

Downloads and more details on the PyTorch 2.8 release via the PyTorch.org blog and GitHub.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Tavily raises $25M to expand real-time web access infrastructure for AI agents – News
Next Article This $100 discount makes the Motorola Razr 2025 my favorite budget flip phone
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

When a journalist uses AI to interview a dead child, isn’t it time to ask what the boundaries should be? | Gaby Hinsliff
News
Scottish Premiership Soccer: Stream Rangers vs. Dundee Live From Anywhere
News
Aura’s Aspen impressive digital frame is the most affordable it’s been
News
RIP Microsoft Lens: Mobile Scanner to Wind Down from Mid-September
News

You Might also Like

Computing

Inclusive design is key to onboarding Nigeria’s unbanked

7 Min Read
Computing

After 48 years at UW, Ed Lazowska reflects on computer science, education, AI, and what’s next

8 Min Read
Computing

Vulkan 1.4.325 Released With Untyped Pointers Extension

2 Min Read
Computing

Ethereum Meme Coin Little Pepe (LILPEPE) Rockets Past $16,475,000 in Presale, 9 Stages Sold Out | HackerNoon

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?