Oracle today announced the Oracle Linux Enhanced Diagnostics (OLED) as their newest project that aims to enhance the debugability of the Linux kernel.
Oracle Linux Enhanced Diagnostics is a set of tools developed in-house at Oracle originally for Oracle Linux and to help enhance the debugging experience especially within cloud environments.
In today’s blog post announcing OLED they explain:
“All the tools/scripts included in this rpm were developed in-house due to real issues that we had to debug, analyze and resolve for our customers. So the need for these rose organically, for which existing debug methodologies fell short. For instance, we’ve seen issues where a lot of processes were stuck in the uninterruptible (D) state for a long time, driving up the load average and leading to device timeouts or soft lockups. We wrote kstack to capture kernel stacktraces of D state processes. We have seen memory growth bugs where all available memory on the system was being slowly used up over weeks or months until either the OOM-killer was invoked or the system crashed. oled memstate was written to debug those issues, to keep an eye on what category of memory was growing and how fast. We have used memstate to debug a host of other issues, including memory fragmentation, incorrect hugepage or DB PGA configurations, detect kernel memory leaks, etc. We have a handful of dtrace scripts that were written to debug one specific problem in one specific environment, but they have been included in this rpm because we think those corner cases aren’t so rare – maybe someone else in the Oracle Linux community will find them useful. Each of these tools and scripts is discussed in the following two sections.
The tools are primarily written in python and C, and the scripts are mostly dtrace. We will continue to add other tools and scripts to this toolset in future releases.”
Among the helpers provided by OLED are for gathering a kernel stack for a given process/PID, a Linux kernel core extractor, capturing and analyzing memory use, scanning KVM images for corruption, and more. Oracle is planning more improvements around OLED as well as the Performance Co-Pilot and Drgn projects to help enhance Linux debugging.
More details on Oracle OLED via the Oracle Linux blog.