LEARN

How to Build a Culture of Observability

At the core of observability is data: metrics, logs and traces. These data sources are the oft-hyped “pillars of observability.” Talking about the pillars of observability is good and well. But the fact is that observability is about more than just data (or, for that matter, the tools that collect and analyze it).

It’s a practice. Indeed, you might even say it’s a mindset shift in the way we understand the unknown unknowns. To make the most of observability, then, organizations need to embrace it as a practice defined by relentless focus on visibility. That’s how you operationalize observability and bridge the gap between:

  • Mere observability data tooling on the one hand
  • Actionable observability on the other

Let us explain in this article. 

Overview of observability

First, let’s define what observability means. There’s a lot to say on this topic but we’ll sum it up by saying that there are two core ideas behind observability:

  1. Observability means looking from the outside in: Observing external outputs allows you to understand a system’s internal state. In other words, if you can collect data that is available on the surface of a system, you can understand what’s happening inside it.
  2. Observability is holistic: Observability means understanding the internal state of the entire system, not just one part of it.

To make a practice out of observability, then, organizations need to focus on using external data to understand the total internal state of the applications, infrastructure or other systems that they manage.

(Read our full observability explainer and explore an illustrated guide to observability.)

Observability and complex systems

It’s important to note, too, that although observability is important when monitoring any type of application or environment, it’s especially critical when dealing with complex distributed systems, such as: 

These systems have so many moving parts, and they change state so rapidly and continuously, that end-to-end observability of the entire environment is the only way to understand what is happening within it and interpret how its various pieces fit together and depend on one another.

(Understand the four golden signals of monitoring.)

Building an observability culture

It’s easy to talk about the types of data and tools that help translate observability into practice. And indeed, there is no shortage of blog posts out there about using logs, traces, and metrics to make systems observable, or about the types of observability tools that help you collect and analyze this data.

But, again, observability in practice is about more than just setting up certain tools to collect and analyze certain types of data. To practice observability, organizations must also build observability into their culture.

Exactly how you build observability into your culture depends on what your culture looks like and which types of systems you maintain. An observability culture within a team that manages cloud-native applications will be different from one within a team that handles legacy systems, for example.

In general, however, an observability culture is defined by tenets like these:

  • Breadth is as important as depth: Because observability is about holistic understanding, teams that embrace it should focus on observing their systems in their entirety. Don’t monitor just your applications or just your infrastructure. Monitor every data source – from ticketing systems and CI/CD pipelines down to the tiniest serverless function – to gain total breadth of visibility.
  • Focus on why, not what: To deliver real value, observability must enable teams to solve real problems. Toward that end, thinking about observability should focus on being able to explain not just what is happening in a system, but why it is happening. To put this another way, don’t just collect metrics, logs, and traces that show what the state of the system is, but analyze and correlate this data to understand why the system is in that state. Again, this correlation and mapping of relationships is especially important in complex distributed environments, where understanding dependencies between services requires deep insights into how everything fits together, and how an issue in one part of the application impacts performance elsewhere.
  • Tie observability to incident response: Along similar lines, observability should directly support incident response processes and incident management. If you want observability to result in actionable change, it needs to be integral to the incident response processes, tools, and teams that can resolve the problems revealed by observability. Otherwise, you’re just observing for observability’s own sake, which doesn’t translate into tangible value.

  • Share observability data and tools across the team: Observability should not be the realm of just SREs or IT engineers. Every stakeholder in the performance management and optimization process should have access to observability tools and systems, and consider themselves owners of observability processes.
  • Use observability to drive continuous improvement: When observability reveals problems, make sure you actually fix them in a way that leads to permanent, continuous improvement. Temporary resolutions (like rolling back a problematic application release without taking the time to find the root cause of the problem) paralyze your ability to tie observability to a culture of positive, ongoing change.

When you bake ideas like these into your organizational culture, you close the gap separating observability as a nice-sounding idea and observability as a practice that has tangible, real-world value for your organization.

That, of course, should be the ultimate goal of any organization that embraces observability. Observability tools and data are meaningless if they are not integrated into an observability practice that marries the technical side of observability to the culture of your team. It is important that your teams have a single consistent user experience across all your metric, trace and log data. 

What is Splunk?

This is a guest blog post from Chris Tozzi, Senior Editor of content and a DevOps Analyst at Fixate IO. Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure, and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO.

This article does not necessarily represent Splunk's position, strategies, or opinion.

Stephen Watts
Posted by

Stephen Watts

Stephen Watts works in growth marketing at Splunk. Stephen holds a degree in Philosophy from Auburn University and is an MSIS candidate at UC Denver. He contributes to a variety of publications including CIO.com, Search Engine Journal, ITSM.Tools, IT Chronicles, DZone, and CompTIA.