Monitoring has been a critical IT practice for decades. Observability has been around for a long time, too: it dates to academic research begun in the 1960s — but it has only been within the past few years that observability has exploded into the world of IT. As a result, IT practitioners today are asking: is there a difference between monitoring and observability? And if so, which one do I need?
This article unpacks those questions by explaining how monitoring and observability work, what they have to do with each other, and how to make the leap from traditional monitoring to full-scale observability.
In IT, monitoring is the collection and analysis of data for the purpose of assessing the health of IT systems. There are many types of monitoring and many resources you can monitor. You could use monitoring to:
- Check whether your website is available
- Evaluate the responsiveness of your APIs
- Identify security risks
- Determine whether your infrastructure has sufficient capacity for your workloads
- And so on…
No matter which type of monitoring use case you are dealing with, however, monitoring boils down to collecting and analyzing predefined types of data — such as network bandwidth or basic CPU and memory utilization rates — in order to detect potential problems.
(Get to know the four golden signals of monitoring.)
What is observability?
Observability is a strategy wherein teams rely on the external outputs of a system in order to infer the internal state of the system.
In other words, when you observe a system, you do more than just collect and interpret predefined data. Instead, you analyze all data that the system exposes – including data points that you may not even have realized were available – to make sense of what is happening inside the system.
Why do IT teams suddenly care about observability?
As noted above, observability has a long history that originates in the context of academic work on signal processing and control systems. It was only in the later 2010s, however, that IT teams began embracing the concept of observability to help manage system performance and availability.
Why did IT practitioners start caring about observability all of a sudden? It's certainly not because they are interested in control theory. Instead, what has happened over the past 6-7 years is that IT systems have exploded in complexity:
- Distributed, microservices-based applications have replaced monoliths.
- Workloads have moved from simple on-premises hosting environments to multi-cloud architectures.
- Orchestration tools, service meshes, overlay networks and the like have introduced layers to hosting stacks that simply didn't exist a decade ago at the typical organization.
In this context, observability has become critical for understanding what is happening deep within complex systems. It's not always possible based on simple, predefined metrics to identify — let alone pinpoint the source of — a complex problem.
For example, knowing that your server's CPU has spiked doesn't mean that you can track down the Pod and container that triggered the spike, or determine whether it's something you should worry about. Nor does tracking overall network bandwidth rates determine whether the bandwidth is being shared by multiple VPCs, load balancers and so on. You need more granularity to identify and investigate performance issues in that context.
By allowing teams to track all available data points that their systems expose, then trace the data back to its source within the system, observability provides a much deeper level of visibility than monitoring could enable.
How does observability work?
In order to make it possible to glean these insights, observability hinges on several practices that aren't a part of conventional monitoring:
- Instrumentation. Observability requires the ability to collect telemetry data from the various objects (containers, services, applications, etc.) that exist within complex systems. This telemetry data must be exposed to observability tools using a framework like OpenTelemetry.
- Data correlation. Instead of analyzing disparate sets of data in isolation, observability requires the correlation and interrelation of multiple data points. It's only by linking and collectively analyzing data from multiple layers of your system that you gain complete context on complex issues.
- Root cause analysis. Whereas monitoring focuses mostly on looking for anomalies or patterns, observability requires the ability to trace potential issues back to their root causes by determining which specific part of a system triggered an issue.
- Automation. It almost goes without saying that observability only works well with the help of automation. In particular, teams need automated machine learning in order to parse the diverse and voluminous data sets that drive observability.
Any tool that promises observability must be able to deliver all of these functions.
Monitoring vs. observability: Leveling up
If you've read this far, it should be clear that monitoring and observability are fundamentally distinct. Monitoring plays a role in collecting the data that drives observability, but monitoring alone falls far short of delivering all of the insights that observability can provide.
So, instead of choosing between monitoring and observability, any team responsible for managing complex systems should be looking for ways to "level up" from monitoring to observability. In the context of modern IT, monitoring is simply no longer enough on its own.
If evolving a monitoring strategy into an observability strategy sounds daunting, keep in mind that it doesn't have to involve rebuilding everything from scratch. If you already have monitoring tools and processes in place that allow you to track generic data from various conventional resources (like servers and applications), you can scale up to observability in your workflows by adding:
- Data correlation
- Automated data analysis
In many cases, the monitoring tools you've already been using are capable of supporting observability, too, if you leverage their advanced features.
Observability is for (almost) everyone
To put it simply, monitoring was something that happened before the advent of today's highly complex applications and architectures. To thrive in the modern, cloud-native world, IT teams need to be able to correlate and analyze data in ways that deliver actionable insights into the internal operations of complex systems – which observability alone can do.
What is Splunk?
This article was written by Chris Tozzi. Chris has worked as a Linux systems administrator and freelance writer with more than ten years of experience covering the tech industry, especially open source, DevOps, cloud native and security. He also teaches courses on the history and culture of technology at a major university in upstate New York.
This posting does not necessarily represent Splunk's position, strategies or opinion.