By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference
Computing

PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference

News Room
Last updated: 2025/08/06 at 9:29 PM
News Room Published 6 August 2025
Share
PyTorch 2.8 Released With Better Intel CPU Performance For LLM Inference
SHARE

PyTorch 2.8 released today as the newest feature update to this widely-used machine learning library that has become a crucial piece for deep learning and other AI usage. There are a few interesting changes worth highlighting with the new PyTorch 2.8 release.

Piquing my interest with PyTorch 2.8 is improved Intel CPU performance. In particular, a focus on high performance quantized large language model (LLM) inference for Intel CPUs using the native PyTorch version. The change outlines the LLM quantization work done by Intel engineers to enhance their x86_64 CPU performance with native PyTorch. A16W8, DA8W8 and A16W4 are among the supported modes. That issue ticket noted:

“With this feature, the performance with PyTorch native stack can reach the same level or even better in some cases as comparing with popular LLM serving frameworks like vLLM when running offline mode on a single x86_64 CPU device, which enables PyTorch users to run LLM quantization with native experience and good performance.”

There have been a lot of Intel CPU commits this cycle such as for FP8 QCONV, FP8 QLINEAR, and using AMX-based micro-kernels in more instances. The AMX micro-kernel improvement can be quite beneficial:

“GEMM templates for INT4 weights are used for lowering `aten._weight_int4pack_mm_for_cpu` with Inductor when max-autotune is on. Currently, AMX-based microkernels are used only when M >= 16 if input tensor has shape [M, K]. However, we find that AMX kernel brings performance benefit when 4 < M < 16. For example, on a 6th gen of Intel(R) Xeon(R) platform, E2E latency can be improved by up to > 20% when running Llama-3.1-8B on 32 cores for M = 8. So, this PR changes the threshold so that AMX is used when M > 4.”

Too bad though that my AvenueCity reference server remains non-operational and thus unable to test the newest PyTorch release (and other Intel open-source improvements in recent months) on the flagship Xeon 6980P Granite Rapids processors… So, unfortunately, no new Xeon 6900P benchmarks at this time on Phoronix.

Intel Xeon 6980P

Also on the Intel side for PyTorch 2.8 is experimental support for the Intel XCCL GPU distributed back-end. XCCL is a distributed back-end for Intel discrete GPUs for various distributed training paradigms.

PyTorch 2.8 also brings SYCL support to the PyTorch CPP Extension API, A16W4 support for XPU devices, experimental wheel variant support, and other enhancements.

Downloads and more details on the PyTorch 2.8 release via the PyTorch.org blog and GitHub.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Tavily raises M to expand real-time web access infrastructure for AI agents –  News Tavily raises $25M to expand real-time web access infrastructure for AI agents – News
Next Article This 0 discount makes the Motorola Razr 2025 my favorite budget flip phone This $100 discount makes the Motorola Razr 2025 my favorite budget flip phone
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Brace Yourself for a Flood of AI Videos: OpenAI’s Sora App Launches on Android
Brace Yourself for a Flood of AI Videos: OpenAI’s Sora App Launches on Android
News
Amazon blocks Perplexity from sending its AI agents to purchase goods –  News
Amazon blocks Perplexity from sending its AI agents to purchase goods – News
News
Why Ford Cars Don’t Have Apple CarPlay Ultra, According To CEO Jim Farley – BGR
Why Ford Cars Don’t Have Apple CarPlay Ultra, According To CEO Jim Farley – BGR
News
Qualcomm’s next flagship chip might come in two flavors, just to complicate things further?
Qualcomm’s next flagship chip might come in two flavors, just to complicate things further?
News

You Might also Like

The Next Tech War Won’t Be About Chips Alone | HackerNoon
Computing

The Next Tech War Won’t Be About Chips Alone | HackerNoon

6 Min Read
Incremental Learning: Comparing Methods for Catastrophic Forgetting and Model Promotion | HackerNoon
Computing

Incremental Learning: Comparing Methods for Catastrophic Forgetting and Model Promotion | HackerNoon

5 Min Read
A Beginner’s Guide to Automation with n8n | HackerNoon
Computing

A Beginner’s Guide to Automation with n8n | HackerNoon

1 Min Read
How Apple Stock Exposes the Limits of the Classic CAPM | HackerNoon
Computing

How Apple Stock Exposes the Limits of the Classic CAPM | HackerNoon

13 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?