Effective Monitoring Is Taking Responsibility to Decide What Matters

By Splunk

Twenty years ago, monitoring was relatively simple: a sys admin sent a check to make sure the host was up and continued looking to make sure that everything remained OK. However, today’s world is rapidly changing — the introduction of hosted infrastructure and open-source middleware is modernizing the datacenter, and monitoring strategies must keep up with this pace of change.

The Arrested DevOps Podcast recently chatted with Jason Dixon, Founder of Monitorama and author of “Monitoring with Graphite,” and Aneel Lakhani, Marketing Director at SignalFx, and discussed how monitoring has evolved over the years. Their combined experience of over thirty years in and around the monitoring space made for a fascinating discussion of where monitoring was, where it is today and where it’s going.

Here are some highlights from the podcast, “Finding Signal in the Noise”:

Context is critical. Individual health checks provided simple answers to simple yes/no questions like “is the host up?” or “am I getting the right ping at the right time?” and ignored all the other components in the environment such as the cluster, the service and especially the customer experience. However, without understanding the historical context of how the service has behaved versus how the service is behaving now, how can you determine whether there is actually an issue that requires your attention?
A “single pane of glass” needs to be a “single pane of glass specific to your role.” Self-service monitoring should provide the flexibility to access any and all data and the simplicity to compose the view that is relevant to their specific use case. If a metric is important to monitor in production, then the metric is important to monitor in testing. Monitoring should be about whether a service is performing and should be instrumented into code from Day 0. How do you know if modifications in the code have the intended effect if don’t follow performance from code to test to deployment to production?
No one is going to tell you what is most important to monitor. And that’s why DevOps is important — you need to think like an engineer, understand what to put in your system and know what to expect. Effective monitoring will require continuous iteration and refinement.
There is no monitoring without alerting. When it comes to operations, monitoring is used to help you determine which alerts will help you understand the capacity, availability and performance of your environment. You have the responsibility to determine when a metric is outside your performance envelop, decide what warning level is sufficient to notify you, and to leave everything else for one-off investigations.
Don’t let tools be a distraction. Start with the basics of monitoring: how to get data in, how to instrument code and which collection agents you’re comfortable with. If one tool isn’t working for you, move on to the next one.
The hardest part of monitoring is figuring out what you care about. Take the responsibility to determine which metrics reflect your priorities, regardless of what tools, architecture, culture or organizational model you’re operating in.

Learn more about alerting for modern architectures in Clever’s Journey to Real-Time Metrics »

Thanks,
Jessica Feng

The State of Observability 2023: Realizing ROI and Increasing Digital Resilience

Splunk has published The State of Observability 2023 — a research report created in partnership with ESG — to understand best practices, challenges and trends across the observability landscape.

DevOps 5 Min Read

Observability for Sustainability

For the past 20 years, the various stakeholder communities that together constitute the IT industry have attempted to address sustainability. Could observability help in practicing sustainable IT and improving the carbon footprint reduction in its operations?

DevOps 4 Min Read

Monitor Containerized Deployments on AWS Bottlerocket with Splunk

Learn how you can monitor the performance of containerized deployments on AWS Bottlerocket with Splunk.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Effective Monitoring Is Taking Responsibility to Decide What Matters

Related Articles

The State of Observability 2023: Realizing ROI and Increasing Digital Resilience

Observability for Sustainability

Monitor Containerized Deployments on AWS Bottlerocket with Splunk

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram