(This post can also be viewed on the SignalFx blog.)
In just five years since its first release, Kubernetes has become so popular that its name is synonymous with container orchestration. According to StackRox’s Kubernetes and Container Security and Adoption Trends 2019, more than 86% of organizations have adopted Kubernetes, a staggering 51% increase from a year ago.1 Without question, Kubernetes gives DevOps teams the flexibility, agility, and speed they need to re-factor their applications as they shift to microservices and reap the benefits of cloud-native technologies. Operating a Kubernetes environment at scale is, however, easier said than done. The layers of abstractions and constant churn that Kubernetes brings make it very difficult to maintain a high fidelity view of what is going on across clusters, identify problems as quickly as they occur, and understand the root causes without getting lost in a sea of data. So it was not surprising to read in last year’s CNCF survey that for 40% and 34% of enterprises, complexity and monitoring were top challenges, respectively.2
"Monitoring containers and microservices infrastructure requires a different approach than monitoring servers and VMs, as 'Assessing Monitoring Tools for a Container-Ready Infrastructure' attests. To make things worse, Kubernetes does not come with a production-grade end-to-end monitoring system.3"
— Gartner, Achieving Kubernetes Operational Readiness to Run Containers in Production
Traditional Solutions Don’t Cut It
Do-it-yourself and traditional commercial monitoring products solutions are extremely limited in functionality when it comes to monitoring Kubernetes. Working with some of the most advanced K8s users in the industry we learned a ton about what isn’t working.
Gap #1: Traditional solutions have poor visualizations and are slow at scale
Solutions based on batch TSDB architectures (most of what it’s out there) can’t keep up with the scale and churn caused by Kubernetes and are therefore severely constrained. For example, they need to visualize all containers at once or are limited to supporting only a very small number of containers, rendering them ineffective for large-scale production deployments. In many cases containers may come and go completely undetected because analytics jobs become too slow. This results in inaccurate alerts and blind troubleshooting.
Gap #2: Traditional solutions are hard to use
Current industry practices based on running kubectl get pods to retrieve simple status information and kubectl describe pod to fetch details about pods have their challenges. Running commands in this manner is insecure, inefficient, and lacks the relevant context DevOps teams need to effectively troubleshoot and investigate errors. Navigating across multiple layers of abstraction becomes a complete nightmare.
Gap #3: Traditional solutions create vulnerabilities
In enterprise environments, it can be challenging to provide the right level of access and visibility to every user, while also maintaining a secure system. On one hand, developers need visibility into their workloads so the platform team doesn't need to get involved in every bug fix. On the other hand, excessively permissive access is a vector for serious security threats, as shown by the ill-fated Kubernetes Dashboard. Traditional solutions don’t approach the problem with DevOps and enterprise in mind. As a result, they lack the administration capabilities to provide agile self-service consumption with centralized control and can’t easily answer simple questions related to inventory, health, and usage that are impacted when multiple DevOps teams use shared container infrastructure.
Say Hello To SignalFx Kubernetes Navigator
Delivered with SignalFx Infrastructure Monitoring, SignalFx Kubernetes Navigator represents another leap forward by SignalFx in providing the ideal Observability platform to monitor, analyze, and triage containerized environments. We designed SignalFx Kubernetes Navigator in collaboration with leading experts with years of collective experience in operating Kubernetes in production at scale to provide a great out of the box experience. We are extremely excited about what we were able to accomplish!
Kubernetes Navigator provides a turnkey solution that combines the power of our streaming architecture with purpose-built visualizations and analytics that address critical operational use cases without requiring any customization. The new pre-built dashboards of Kubernetes Navigator are natively aware of Kubernetes objects and characteristics, which allows the operator to navigate across abstraction layers and maintain hierarchical context out of the box. This enables highly efficient exploration and investigation of issues. To make triaging and problem resolution even faster and more proactive, the new AI-driven analytics that come with Kubernetes Navigator automatically surface insights and recommendations for further investigation.
Automatic discovery and instant visualization of the inventory, health, and performance of container resources with dynamic cluster maps and pre-built dashboards
✓ Faster time to value
✓ Reduce mean time to clue
Automatic insights and recommendations into potential infrastructure problems
✓ Expedite troubleshooting
Real-time, accurate alerts
✓ Reduce mean time to detect
Full-stack visibility that correlates service workloads to containers and links directly to SignalFx Microservices APM service dashboards
✓ Expedite root cause analysis
The right information and capabilities specifically to those that need it without providing direct access to the cluster
✓ Maintain a secure and enterprise-grade infrastructure
✓ Centralized management with self-service access
“SignalFx Kubernetes Navigator gives operations and engineering teams the insight they need to monitor and manage the health of our containerized environments in real-time. Our engineers use SignalFx to oversee the services we're migrating to Kubernetes, and our operations team is able to quickly diagnose any issues with underlying infrastructure or the orchestration platform itself. The ability to jump from an under-performing pod directly into the metrics for that service and its neighbors is instrumental in resolving issues quickly and keeping Care.com running smoothly.” — Matt Coddington, Senior Director, Production Operations Engineering at Care.com
Auto-Discovery & Instant Visualization
Using our open standards-based Smart Agent, SignalFx Kubernetes Navigator automatically discovers the full hierarchy of objects and their associated metadata — clusters, nodes, pods, containers, and namespaces — as well as the workloads running in them. As that information is streamed through the SignalFx platform, SignalFx Kubernetes Navigator dynamically produces interactive cluster maps, builds detailed node and workload lists, and populates pre-built performance dashboards.
Dynamic and interactive cluster map
AI-Driven Insights & Recommendations
SignalFx Kubernetes Navigator also analyzes the data in flight to instantly surface erroneous conditions grouped by cluster and other metadata tags, such as kubernetes_namespace and container_image, and captures critical context that can be used to correlate workloads and services running on infrastructure objects. With the benefits of real-time visibility, an interactive user experience, and AI-assisted troubleshooting, users can effortlessly zoom in, filter, and explore the most complex containerized environments to quickly spot what until now have been difficult-to-find problems.
AI-driven insights and recommendations
Furthermore, with the knowledge of which workloads are running in specific containers, users are also able to jump straight from SignalFx Kubernetes Navigator to SignalFx Microservices APM to view, understand, and explore the relationship between various infrastructure objects and the services running on them. Users can leverage various metadata, such as the container ID, workload ID, or service ID to correlate how the behavior of infrastructure is impacting service interactions and end user transactions or vice versa. This is particularly helpful during troubleshooting when DevOps teams need to quickly pinpoint which service is causing sudden spikes in latency or error rate and why.
Workload drill-down to container
Answering Those Hard-To-Answer Kubernetes Questions
SignalFx Kubernetes Navigator is purpose-built to provide answers that would otherwise be difficult and, in many cases, impossible to find using current monitoring solutions. Some of the most common questions we hear from customers, such as those noted below, are addressed out-of-the-box by SignalFx Kubernetes Navigator.
What infrastructure do I have? Keeping track of infrastructure inventory, capacity, and cost used to be easy in bare metal and virtualized environments. With infrastructure orchestrated based on the ever-changing resource requirements of service workloads, it’s extremely difficult for infrastructure operators to understand what resources are available at any point in time. Kubernetes Navigator addresses this by orienting users via a dynamic and interactive cluster map. The map helps visually orient users to their containerized infrastructure, especially if they are new to Kubernetes and just starting to move workloads into containers.
Node- and pod-level inventory within a selected cluster
Who’s using which resources? In addition to understanding what resources are available, infrastructure operators often find it difficult to know who is using a specific set of resources, which is particularly vexing when those resources are spread across multiple containers, pods, and nodes. Similarly, service owners don’t always have clear visibility into precisely where their workloads are running. Now with SignalFx Kubernetes Navigator, infrastructure operators have granular visibility into each individual container, and service owners have an easy-to-use and easy-to-understand interface that doesn’t require any special access permissions or query language. This high-resolution and self-service visibility is especially important in large, distributed enterprises where infrastructure is provided as a service by centralized IT teams to internal service teams.
Detailed list of workloads running in selected cluster
Why do I have a problem? Why are worker nodes failing? Why is this particular pod stuck in pending state? Why are some pods running hotter than others? What is causing resource contention, and is this preventing additional pods from being deployed? SignalFx Kubernetes Navigator answers these everyday questions out of the box. With AI-driven analytics, Kubernetes Navigator quickly detects patterns, correlates from individual containers to services, and surfaces meaningful insights that assist infrastructure operators in answering why problems exist.
Cluster map side panel with AI-driven insights and recommendations
What’s changed? In dynamic and ephemeral Kubernetes environments, service owners often face problems starting or scheduling workloads. They also have to deal with infrastructure-related failures or performance issues during normal operations. In response to these issues, service owners and infrastructure operators need to understand what — if anything — has changed. For example, was a new controller or version of a controller rolled out? Was there a new network subsystem upgrade? Or are the containers, pods, and nodes suddenly experiencing high CPU, memory, or network utilization due to too many co-located workloads or a noisy neighbor? Changes to Kubernetes infrastructure, whether intentional or unintentional, and additional workloads from other users often impact one another. Kubernetes Navigator helps service owners and infrastructure operators drill down to the container level to instantly spot erroneous conditions to see what has changed.
Node-level performance dashboard
Get Started with SignalFx Kubernetes Navigator
Customers have for years relied on SignalFx Infrastructure Monitoring to detect and solve issues in distributed environments. With Kubernetes adoption reaching widespread proportions, more enterprises understand the critical need for a monitoring capability that has the scale, speed, and intelligence to solve problems in massively distributed containerized environments. SignalFx Kubernetes Navigator delivers real-time visibility and AI-driven insights into Kubernetes environments and the workloads they support, giving service owners and infrastructure operators the turnkey, enterprise-grade Observability they need to spot and respond to issues before they impact customers at any scale.
If you’re an existing SignalFx customer, sign up to try SignalFx Kubernetes Navigator here.
3 Gartner Achieving Kubernetes Operational Readiness to Run Containers in Production, Simon Richard, 7 June 2018 (report available to Gartner subscribers)