Google Cloud has released a new open-source tool that visualises cluster logs chronologically to simplify troubleshooting in Kubernetes environments.
Kubernetes History Inspector (KHI) is intended to help administrators to debug problems inside Kubernetes clusters and identify root causes. According to Kakeru Ishii and Takeie Torinomi in the launch blog post for KHI, putting together a comprehensive view of a problem in a Kubernetes deployment can be very complex and daunting, especially in consideration of the huge amount of log data that even a moderately-sized Kubernetes cluster can generate:
The real challenge lies in analysing [logs] effectively. Many issues you’ll encounter in a Kubernetes deployment are not revealed by a single, obvious error message. Instead, they manifest as a chain of events, requiring a deep understanding of the causal relationships between numerous log entries across multiple components.
– Kakeru Ishii and Takeie Torinomi (Google Cloud)
While managed Kubernetes services like Google Kubernetes Engine (GKE) and AWS’s Elastic Kubernetes Service (EKS) simplify log collection, administrators generally have to look to other tools for effective analysis of those logs. The Kubernetes History Inspector addresses this problem by analysing logs collected through Cloud Logging, extracting state information for each component, and presenting this data in a visual timeline. It also links this timeline back to the raw log data, enabling users to track component usage over time. KHI enables administrators to use an interactive GIU instead of writing complex queries.
The interface attempts to give both a macroscopic and microscopic view of a cluster history – with state changes of individual components shown on the left-hand side, and raw logs, manifests and historical changes from a selected component in micro view on the right-hand side. Beyond visualising component states, KHI aims to illustrate the relationships between components at any given point in the past, presenting the complex interdependencies within a Kubernetes cluster in a clear, understandable format.
In an episode of Google’s Kubernetes Podcast, Ishii discusses how the distributed nature of running applications in Kubernetes makes log management quite tricky. In conversation with host Abdel Sghiouar, Ishii explains how KHI was built to make sense of logs from multiple components without overwhelming users with raw data. Ishii also explains that KHI has an Angular JS frontend and uses WebGL to render visualisations.
Ishii also considers potential integrations with AI Large Language Models (LLMs), to enhance troubleshooting capabilities. However, he ventures that the level of understanding a good visualisation through KHI can provide may be more beneficial at this stage than a text-based analysis can manage.
In a blog post, William Denniss reviews KHI and provides a step-by-step guide to getting started with it. Denniss is particularly impressed with the UI.
What I love about this UI is that it’s information dense. The recent trend in UI design is to create simple, clean, plain designs focused on usability and simplicity. But sometimes, as a practitioner you just want to see all the data on one screen. KHI does that, and it feels like you’re more in command of the whole setup. At least to me.
– William Denniss
Currently, KHI works only with GKE and Kubernetes on Google Cloud in combination with Cloud Logging. However, plans exist to extend its capabilities to vanilla open-source Kubernetes setups in the future. Other tools are available for Kubernetes administrators running on different clouds, such as Sloop from SalesForce. Sloop records event history and resource state changes and provides visualisations to help administrators debug events. It can also track resources that no longer exist and shows timelines of how components such as Deployments and StatefulSets are changed.
For users of EKS (Elastic Kubernetes Service), AWS offers an option for exporting log events to OpenSearch, providing visualisations through an OpenSearch dashboard. Similarly, AKS Periscope aims to detect Kubernetes cluster problems on AKS (Azure Kubernetes Service), but neither currently appears to have the visual troubleshooting abilities of KHI.
The KHI GitHub page provides detailed information about KHI’s specifications, visual elements, and deployment instructions. It is packaged as a container image that requires no prior setup and can be launched with a single command.