By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: AMD Instinct Accelerators With So Much vRAM Have Exposed Linux Hibernation Issues
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > AMD Instinct Accelerators With So Much vRAM Have Exposed Linux Hibernation Issues
Computing

AMD Instinct Accelerators With So Much vRAM Have Exposed Linux Hibernation Issues

News Room
Last updated: 2025/06/30 at 7:40 AM
News Room Published 30 June 2025
Share
SHARE

Too much vRAM and too many Instinct accelerators per server is causing system hibernation to fail on some high-end AMD AI Linux-powered servers. Having eight accelerators each with 192GB of device memory can in turn cause system hibernation to run into problems if the Linux server has only 2TB of system RAM… But a new patch series was posted today in working to address this problem with the Linux kernel for high-end systems failing to hibernate. A similar issue is that when thawing the system the process can take nearly one hour do to the amount of memory.

AMD engineer Samuel Zhang explained the current behavior of Linux servers potentially running into hibernation issues if there is too much vRAM due to the hibernation process trying to evict that memory to GTT or shared memory. In some situations two copies of the vRAM contents could be made to system RAM and in turn exhausting all of the system memory.

Samuel Zhang explained on today’s Linux patch series working to address the hibernation issue within the Linux kernel:

“Modern data center dGPUs are usually equipped with very large VRAM. On server with such dGPUs(192GB VRAM * 8) and 2TB system memory, hibernate will fail due to no enough free memory.

The root cause is that during hibernation all VRAM memory get evicted to GTT or shmem. In both case, it is in system memory and kernel will try to copy the pages to hibernation image. In the worst case, this causes 2 copies of VRAM memory in system memory, 2TB is not enough for the hibernation image. 192GB * 8 * 2 = 3TB > 2TB.

The fix includes following 2 changes. With 2 changes, there’s much less pages needed to be copied to hibernate image and hibernation can succeed.
1. move GTT to shmem after evicting VRAM. then the GTT pages can be freed.
2. force write shmem pages to swap disk and free shmem pages.

After swapout GTT to shmem in hibernation prepare stage, swapin and restore BOs in thaw stage takes lots of time (50 mintues observed for 8 dGPUs). And it’s not necessary since the follow-up hibernate stages do not use GPU for hibernation successful case. The third patch is just skip the BOs restore in thaw stage to reduce the hibernation time.”

Granted, most high-end accelerator-powered/AI servers are in use constantly, but for those wanting to hibernate them during downtime for reducing power consumption, this is apparently a real problem in play. Besides exhausting the system memory, the other issue at hand is the possibility of taking nearly an hour for swapping in and restoring buffer objects in the GPU memory when taking the system out of hibernation.

AMD Instinct MI350 series hardware

These patches affecting the Linux power management code as well as the AMDGPU kernel driver are now under review for hopefully making it into the mainline kernel in a future kernel cycle.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Trump Says He’s Found ‘Very Wealthy People’ to Buy TikTok and Avoid the Ban
Next Article Energy Revolution System (Consumer Reports And Complaints) Is This DIY Home Generator Kit Worth Your Investment?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Govee’s floor lamp is the ultra-modern mood setter my lounge was crying out for | Stuff
Gadget
BYU star Jake Retzlaff denies sexual assault claims – admits to breaking rules
News
Psychology of the Banana Zone | HackerNoon
Computing
Trump administration finds Harvard failed to protect Jewish students, threatens to cut all funding
News

You Might also Like

Computing

Psychology of the Banana Zone | HackerNoon

3 Min Read
Computing

U.S. Agencies Warn of Rising Iranian Cyberattacks on Defense, OT Networks, and Critical Infrastructure

5 Min Read
Computing

Ubuntu Debcrafters Team Formed To Help Ensure The Health Of The Ubuntu Archive

2 Min Read
Computing

Ant Group’s R&D spending hits record high, expands AI and payment services · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?