Today’s cloud-native applications are complex systems with many interconnected parts. Without end-to-end observability, we can’t understand WHY our system is not working, the way it is intended to be. With Holistic observability in place, we can monitor how each component is performing, monitor their interactions, identify potential problems before they erupt, and proactively resolve them.
In the ever-changing landscape of the Cloud-native world, a one-size-fits-all approach to observability is no longer viable. Traditional monitoring paradigms are struggling to keep pace with the cloud-native development scale, falling short of ensuring reliability, performance, and security. This requires an end-to-end solution that ties the causal relation from the smallest component of our system which would be code right to the very systems that are abstracted out at the business level.
Before we delve into the 4C framework, and how to use it for better observability in the cloud-native world, let’s first understand what constitutes a cloud-native environment.
Cloud Native Software Development
Cloud-native is a software paradigm that uses cloud computing in public, private, and hybrid clouds to build, run, and scale applications in rapidly changing environments. It uses various cloud provider services like object stores, managed services like Kafka, Kibana etc., container orchestration, auto-scaling, and burstable computing to speed up development while ensuring reliability and scalability. The Cloud-native systems includes:
- Multiple cloud providers (AWS, Azure, Google Cloud, on-premises)
- Plethora of cloud services (e.g. AWS Lambda, Azure Cosmos DB, Google Cloud Pub/Sub)
- Complex architectures (microservices or monolithic, synchronous, or event-based communication)
The surface area to observe and build the monitoring systems is vast. Since observability is crucial for each component in the system, it’s essential to understand how to build and set up our observability effectively.
To address the challenge of building observability across complex systems and creating a comprehensive framework to tackle this immense problem, the 4C framework is proposed.
4C Framework for Cloud Native Observability
To understand the complexity and build cloud-native Observability for our systems, we will use the 4C framework. The 4C represents 4 layers that that are present in the cloud native ecosystem namely, “ Cloud, Cluster, Container, and Code”. This 4C framework can help us build an end-to-end observability system as it covers
- The Different Cloud/PAAS services that we might consume to build our application
- The Kubernetes Cluster (Infrastructure) Metrics ( Since most of the workloads are mostly in Kubernetes in cloud Native space) to understand how Infrastructure is interacting with our workloads.
1. Cloud Layer
This layer examines various cloud services utilized by our apps. This includes understanding service availability, performance metrics, and potential bottlenecks that could impact application delivery.
Metrics |
Logs |
Traces |
---|---|---|
Resource Utilization: Monitor CPU, memory, and disk usage |
Service Logs: Collect Cloud application logs from AWS/GCP/AZURE or their cloud providers to understand Infra behavior and errors. |
Request Tracing: Implement distributed tracing tools (e.g.: Xray) to follow requests across different Cloud services used to run a certain workload. |
Service Latency: Track response times for API calls and service interactions. |
Audit Logs: Monitor changes in resource configurations and resource access logs for security and compliance. |
|
Error Rates: Measure the frequency of failed requests or service errors at load-balancers etc. |
2. Cluster Layer
Monitoring Kubernetes clusters is key to ensuring that workloads are managed effectively. Key metrics include CPU and memory usage, pod status, and network performance.
Metrics |
Logs |
Traces |
---|---|---|
Cluster Health Metrics: Monitor node status, pod status, Node Pressure, Disk Pressure, and resource allocation. |
Kubernetes Logs: Logs from Kubernetes components (kubelet, kube-apiserver) and application pods using tools like Fluentd or Elastic Stack. |
Service Mesh Integration: If using a service mesh (e.g., Istio), enable tracing capabilities to monitor service-to-service communications. |
Network Traffic Metrics: Track ingress and egress traffic to the network, traffic type, and traffic rate. |
Audit Logs: Monitor changes in resource configurations and resource access logs for security and compliance. |
|
Control Plane Metrics: Metrics related to Kubernetes control plane components (API server, etc.). |
3. Container Layer
Container observability is challenging due to the dynamic nature of ephemeral containerized applications. Implementing logging frameworks and telemetry tools helps capture data related to container health, resource usage, and operational anomalies.
Metrics |
Logs |
Traces |
---|---|---|
Container Resource Metrics: Monitor CPU and memory usage, PVC usage, and GPU consumption per container instance. |
Application Logs: Collect logs generated by pods and their init containers running in containers to understand if the pod has gone into crash loop backoff or is unable to access the image repository etc. |
End-to-End Request Tracing: Use OpenTracing or Jaeger to capture traces as requests move through various containers. |
Health Checks: Implement readiness and liveness probes to ensure container health. |
4. Code Layer
At this level, observability focuses on application performance monitoring (APM) tools that track user interactions and application errors. By analyzing logs and traces from the code layer, developers can pinpoint areas for optimization or refactoring. This proactive approach helps enhance user experience by ensuring that applications perform reliably under varying loads.
Metrics |
Logs |
Traces |
---|---|---|
Application Performance Metrics: Track response times, request counts, and throughput. |
Detailed Application Logs: Capture structured logs that provide context around application events, including errors and warnings. |
Transaction Tracing: Enable tracing within your application to track the flow of transactions through various functions or services. |
Error Rates: Monitor the frequency of exceptions or errors in the application logic. |
Conclusion
The 4C framework gives us an Observability mental model that answers the “WHY”, “WHAT” and “HOW” the observability should look like in the cloud-native world, irrespective of the tools you use to build observability in your environment. This framework helps you stitch the observability at different network boundaries, bringing a holistic view of what is happening in the entire system, simplifying and creating a proactive stance instead of a reactive one.