Effective Monitoring Is Taking Responsibility to Decide What Matters

Twenty years ago, monitoring was relatively simple: a sys admin sent a check to make sure the host was up and continued looking to make sure that everything remained OK. However, today’s world is rapidly changing — the introduction of hosted infrastructure and open-source middleware is modernizing the datacenter, and monitoring strategies must keep up with this pace of change.

The Arrested DevOps Podcast recently chatted with Jason Dixon, Founder of Monitorama and author of “Monitoring with Graphite,” and Aneel Lakhani, Marketing Director at SignalFx, and discussed how monitoring has evolved over the years. Their combined experience of over thirty years in and around the monitoring space made for a fascinating discussion of where monitoring was, where it is today and where it’s going.

Here are some highlights from the podcast, “Finding Signal in the Noise”:

  • Context is critical. Individual health checks provided simple answers to simple yes/no questions like “is the host up?” or “am I getting the right ping at the right time?” and ignored all the other components in the environment such as the cluster, the service and especially the customer experience. However, without understanding the historical context of how the service has behaved versus how the service is behaving now, how can you determine whether there is actually an issue that requires your attention?
  • A “single pane of glass” needs to be a “single pane of glass specific to your role.” Self-service monitoring should provide the flexibility to access any and all data and the simplicity to compose the view that is relevant to their specific use case. If a metric is important to monitor in production, then the metric is important to monitor in testing. Monitoring should be about whether a service is performing and should be instrumented into code from Day 0. How do you know if modifications in the code have the intended effect if don’t follow performance from code to test to deployment to production?
  • No one is going to tell you what is most important to monitor. And that’s why DevOps is important — you need to think like an engineer, understand what to put in your system and know what to expect. Effective monitoring will require continuous iteration and refinement.
  • There is no monitoring without alerting. When it comes to operations, monitoring is used to help you determine which alerts will help you understand the capacity, availability and performance of your environment. You have the responsibility to determine when a metric is outside your performance envelop, decide what warning level is sufficient to notify you, and to leave everything else for one-off investigations.
  • Don’t let tools be a distraction. Start with the basics of monitoring: how to get data in, how to instrument code and which collection agents you’re comfortable with. If one tool isn’t working for you, move on to the next one.
  • The hardest part of monitoring is figuring out what you care about. Take the responsibility to determine which metrics reflect your priorities, regardless of what tools, architecture, culture or organizational model you’re operating in.

Learn more about alerting for modern architectures in Clever’s Journey to Real-Time Metrics »

Jessica Feng

Posted by


Show All Tags
Show Less Tags