Eugen Hristev of Linaro sent out a “request for comments” patch series today proposing kmemdump for the Linux kernel as a new means to assist in debugging driver/system problems by making it easier to dump memory for specific areas/regions.
Kmemdump is infrastructure that allows kernel drivers to register specific chunks of memory and those flagged areas can be easily dumped in case of system problems or other debugging purposes. The infrastructure is basically allowing drivers or other producers to recognize certain regions of memory important for debugging so that they can be easily dumped later on short of having to dump and archive all of the memory (RAM) contents. With appropriate system capabilities and hardware, it could be possible to save those kmemdump-monitored regions in case the kernel becomes frozen/crashed or other problematic state and just keeping track of those specific memory areas.
This experimental kmemdump code also allows assembling memory regions into a coredump readable format for consumption by debuggers. The marked memory regions can be put into a core ELF file along with associated data structures so that the coredump file can then be loaded into the GNU Debugger (GDB) or other crash reporting analysis tools.
With Linaro’s involvement, the initial focus seems to be on making use of kmemdump for Qualcomm hardware. As part of this RFC patch series is Qualcomm Minidump as a back-end for kmemdump.
Eugen Hristev sums up in the RFC patch series for kmemdump:
“kmemdump is a mechanism which allows the kernel to mark specific memory areas for dumping or specific backend usage.
Once regions are marked, kmemdump keeps an internal list with the regions and registers them in the backend.
Further, depending on the backend driver, these regions can be dumped using firmware or different hardware block.
Regions being marked beforehand, when the system is up and running, there is no need nor dependency on a panic handler, or a working kernel that can dump the debug information.
The kmemdump approach works when pstore, kdump, or another mechanism do not. Pstore relies on persistent storage, a dedicated RAM area or flash, which has the disadvantage of having the memory reserved all the time, or another specific non volatile memory. Some devices cannot keep the RAM contents on reboot so ramoops does not work. Some devices do not allow kexec to run another kernel to debug the crashed one. For such devices, that have another mechanism to help debugging, like firmware, kmemdump is a viable solution.
kmemdump can create a core image, similar with /proc/vmcore, with only the registered regions included. This can be loaded into crash tool/gdb and analyzed.
To have this working, specific information from the kernel is registered, and this is done at kmemdump init time, no need for the kmemdump user to do anything.”