In the event of your AMD Ryzen or EPYC system being randomly reset or unexpectedly rebooted under Linux, the Linux kernel with the upcoming Linux 6.16 cycle is gaining the ability to report the reason for that reset. This is making use of a technical capability found going back to the AMD Zen 1 processors that the Linux kernel is now tapping into for reporting the cause of any previous system reset.
AMD CPUs back to Family 17h (Zen 1) have a register that indicates the cause of the previous reset. The Linux kernel will now report at boot time to the dmesg the cause of any previous reset on AMD systems by decoding that AMD register.
The register can indicate if the system was reset due to a thermal pin being tripped, if the power button was pressed or shutdown pin shorted, if an internal CPU thermal limit was tripped, if software issued a PCI reset, if an internal CPU shutdown event occurred, a parity error caused the reset, or various other recognized events that could cause a “random reboot” to occur.
On boot written to the kernel log, there will be a line beginning with “x86/amd: Previous system reset reason ” followed by the decoded reason for the previous system reset. Quite a useful addition albeit surprising this wasn’t added before with the register being around since Zen 1.
This patch adding the capability to report the reason for the last reset on AMD hardware was queued into tip/tip.git’s x86/platform branch and thus material ready for the upcoming Linux 6.16 merge window. Also queuing by way of tip/tip.git is the AMD Zen debugging documentation guide that has been worked on the past number of weeks.