Digital Resilience Pays Off
Download this e-book to learn about the role of Digital Resilience across enterprises.
FThis is the first in a series of blogs on monitoring Kubernetes. Part 1 (this blog) begins with Kubernetes architecture, various sources for monitoring data within Kubernetes, and open source monitoring approaches. Part 2 explains how to monitor Kubernetes using Splunk Infrastructure Monitoring.
As organizations are adopting microservices architecture to gain speed, resilience, and scalability, they are increasingly adopting containers to package, distribute and run distributed microservices. According to Gartner, by 2022, more than 75% of global organizations will be running containerized applications in production, which is a significant increase from fewer than 30% today.
Kubernetes has emerged as the de facto standard for container orchestration. With over 51K stars on GitHub and 2,100 contributors spanning every time zone across the globe, Kubernetes enjoys a vibrant community. According to the latest survey by Cloud Native Computing Foundation (CNCF), Kubernetes adoption has been consistently increasing and 83% of respondents cite Kubernetes as their choice for container management.
In this blog, we will discuss various monitoring approaches and best practices to design an end-to-end observability strategy for Kubernetes workloads.
Kubernetes is a system that assembles a group of machines into a single unit that can be consumed via an API. Kubernetes further segments those compute resources into two groups: Worker Nodes, and Master Node(s).
Kubernetes consists of persistent entities called Kubernetes Objects which represent the state of Kubernetes cluster. These objects include:
The biannual CNCF survey cites monitoring as one of the top challenges in successfully adopting Kubernetes. However, traditional monitoring tools are unsuitable for measuring the health of cloud-native applications and infrastructure. As Gartner notes:
"Historically, monitoring tools have focused on host-level metrics, such as CPU utilization, memory utilization, input-output (I/O) per second, latency and network bandwidth. Although these metrics are still important for operations teams, by themselves they won’t paint a full picture, because they lack granular detail at the container or service level."
The monitoring strategies of yore do not work in the cloud-native era primarily because:
In the monolithic world, there are only two components to monitor — applications and the hosts on which the applications were deployed. In the cloud-native world, containerized applications orchestrated by Kubernetes have multiple components that require monitoring:
Unlike the traditional long-running host model, modern microservices-based applications are typically deployed on containers which are dynamic and ephemeral. Kubernetes ensures that the desired number of application replicas are running. Kubernetes will place the pods on whichever nodes it deems fit, unless specifically instructed not to do so via node affinity or taints. In fact, letting Kubernetes schedule pods is the key design goal for this self-adjusting system.
Traditional monitoring approaches do not work in these highly-dynamic environments because they typically follow long-running hosts using host names or IP addresses. For containerized environments, monitoring tools must provide immediate service discovery and automatically detect the lifecycle events of containers, while also adjusting metric collection as containers are spun up or restarted in seconds
Pinpointing issues in a microservices environment is more challenging than with a monolithic one, as requests traverse both between different layers of the stack and across multiple services. Modern monitoring tools must monitor these interrelated layers while also efficiently correlating application and infrastructure behavior to streamline troubleshooting
DevOps teams typically first start with the built-in monitoring options that come with a standard Kubernetes deployment. These options include:
Using Kubernetes Liveness probe, Kubernetes will execute a liveness command within the pod with a predefined frequency. Below is a simple example from Kubernetes docs of liveness probe by running cat command.
While probes provide simplistic checks at a moment of time, they lack sophisticated performance analytics capabilities as well as persistence for historical trends.
Readiness probes are conducted by Kubernetes to know when a Container is ready to start accepting traffic.
cAdvisor is an open source container resource usage and performance analysis agent. It is purpose built for containers and supports Docker containers natively. In Kubernetes, cAdvisor is integrated into the Kubelet binary. Hence it runs at every node instead of the pod. cAdvisor auto-discovers all containers in the machine and collects CPU, memory, filesystem, and network usage statistics.
Although cAdvisor provides basic machine-level performance characteristics, it also lacks analytics and persistence to understand trends
Starting from Kubernetes 1.8, resource usage metrics, such as container CPU and memory, disk usage, are available in Kubernetes through the Metrics API.
Metrics API does not store the value over time – i.e. by calling the API you can find out what is the value of resource utilization now but it cannot tell you what the value was 10 minutes ago.
Inspired by Heapster, which is now deprecated, Metric server makes the resource usage at nodes and pods level via standard APIs the same way the other Kubernetes APIs are accessed.
Metric Server provides performance data via APIs and can be configured to persist the data over time, however, it lacks analytics and visualization capabilities.
Kubernetes Dashboard provides basic UI to get resource utilization information, manage applications running in the cluster as well as to manage the cluster itself.
Kubernetes Dashboard can be accessed with the following command:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
To access Dashboard from your local workstation you must create a secure channel to your Kubernetes cluster. Run the following command:
$ kubectl proxy
Kubernetes Dashboard can now be accessed here.
While Kubernetes dashboard provides basic visualization, it depends on Heapster, which is now deprecated.
Typically used in conjunction with Heapster, Kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects such as Deployment, Node, PersistentVolume, Pod, Service, etc. a full list of all exposed metrics is available here.
Kube-state-metrics provides the information in the raw plain text form and provides an end-point to scrape the metrics. DevOps teams would need to bring their own metrics storage, visualization, and alerting tools to make data from Kube-state-metrics actionable.
Prometheus provides a way to get end-to-end visibility into your Kubernetes environments. Prometheus monitoring system is however much more than just the Prometheus time series database – to get complete visibility, you will need to install and maintain the entire Prometheus monitoring system as shown below:
While it is easy to get started with Prometheus in dev, test environments, teams quickly realize that by default Prometheus lacks enterprise-grade features to monitor mission-critical workloads in production and at scale. Primarily, the challenges are:
The next blog in this series will focus on how to get end-to-end visibility into Kubernetes environments as well as directed troubleshooting with Splunk Infrastructure Monitoring. If you are new to Splunk Infrastructure Monitoring, get started by signing up for a free trial.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.