Measuring & Improving Observability-as-Service (OaaS) with KPIs and OKRs

Welcome to the third blog of the Observability Center of Excellence (O11y CoE) series! If you’ve been following along, we’ve discussed the why behind an O11y CoE, and we explored how to assemble and structure the team to make it a reality.

Now, we’re ready to dive deeper into one of the CoE’s critical functions: defining and measuring Observability as a Service (OaaS).

In the context of an Observability CoE, OaaS is the operating model for delivering observability capabilities to the organization. Much like other "as a service" models, OaaS focuses on providing observability as a scalable, measurable, and value-driven practice that supports teams across the business.

To determine its effectiveness, it must be instrumented — just like the systems it aims to monitor.

Is your observability practice positioned to help teams resolve incidents faster, reduce downtime, and optimize performance? Defining some base KPIs early in your journey not only helps the CoE answer these questions but also enables it to leverage data to understand what’s working (and what’s not).

These KPIs provide visibility into the CoE’s value, empowering it to continuously refine and improve its delivery of observability services. In this blog, we’ll explore:

By the end, you’ll have the tools and insights to ensure your Observability CoE is delivering measurable value through OaaS, setting the stage for future enhancements like maturity assessments and tactical implementations.

KPIs vs. OKRs: understanding the difference

A fellow Splunker created a great article on KPIs, OKRs, and metrics, breaking down their distinctions and how they complement each other. The gist is simple:

Key performance indicators (KPIs) are like the operational pulse of your observability practice. They answer questions like, “What’s happening right now?” and “What trends have emerged over time?”

These indicators provide a near-time and historical view into the health of your OaaS, helping you identify trends, measure effectiveness, and take action.

Objectives and Key Results (OKRs) are about where you want to go. They combine a clear objective (the goal) with measurable results to ensure progress.

While KPIs tell you what’s happening, OKRs drive strategic alignment and improvements.

How OKRs and KPIs work together for observability

Imagine your Observability CoE tracks a KPI called Agent Saturation, which measures the percentage of available resources instrumented with observability agents. This KPI shows how comprehensively your environment is covered.

The KPI tells you: "We currently have 75% saturation across Tier 0 and Tier 1 applications." In response to this, the related OKR might be:

In this case, the KPI provides the current state and historical context, while the OKR establishes the target state and timeframe for improvement. Together, they ensure the CoE can monitor progress while driving a strategic outcome.

Why both matter

KPIs and OKRs complement each other by ensuring your OaaS practice is operationally effective and strategically aligned:

Together, they create a feedback loop: KPIs inform how close you are to achieving OKRs, while OKRs ensure you’re focusing on initiatives that deliver meaningful value. By distinguishing between KPIs and OKRs, your Observability CoE can build a framework that:

What makes a good KPI?

Any service offering thrives on actionable, meaningful, and relevant KPIs that provide insights into what’s working — and what isn’t. A well-chosen KPI doesn’t just measure performance; it also drives continuous service improvement and supports broader objectives, such as enabling the Observability CoE (O11y CoE) to achieve its OKRs.

(Learn more about KPI management, including how to identify impactful KPIs, avoid common mistakes, and set up KPI management frameworks.)

Common pitfalls to avoid

Defining KPIs is as much about knowing what to avoid as it is about selecting the right metrics. Some common pitfalls include:

The role of the O11y CoE in KPI success

The Observability CoE is central to ensuring success with both KPIs and OKRs. By defining actionable KPIs early and aligning them with clear OKRs, the CoE can:

Defining KPIs isn't just about tracking progress; it's about laying the foundation for a successful Observability-as-a-Service (OaaS) model.

By explicitly integrating OKRs, your O11y CoE gains the ability to continuously adapt, refine, and enhance its value proposition. This alignment ensures that observability practices drive iterative and constant value updates to the business, keeping the organization responsive and competitive.

Categories of observability KPIs

When identifying KPIs for your Observability CoE, it’s useful to group them into categories based on their focus and purpose. To quickly recap, OaaS KPIs should help assess whether your OaaS operating model is effectively delivering, or is positioned to deliver, observability capabilities to the organization.

Organizing KPIs into these categories ensures your measurements are actionable and aligned with the outcomes your Observability as a Service (OaaS) practice strives to achieve.

Later in this blog, I’ll provide specific examples of O11y KPIs, including their descriptions, purposes, calculations, potential data sources, and which category they fall under. For now, let’s explore the core KPI categories:

1. Availability

Focus: Ensuring observability tools and platforms are operational and accessible.

This type of KPI tracks the reliability of your observability ecosystem, helping you answer questions like:

2. Utilization

Focus: Monitoring the deployment and use of observability tools and resources.

Utilization KPIs measure things like license usage, tool versioning, and deployment coverage, ensuring you’re getting the most out of your investments. Key questions include:

3. Adoption

Focus: Measuring engagement with observability tools and practices across teams and environments. Adoption KPIs cover two key dimensions:

4. Optimization

Focus: Enhancing efficiency and reducing noise.

Optimization KPIs evaluate how well your observability practice reduces unnecessary alerts, improves workflows, and minimizes manual effort. These KPIs tackle questions like:

By organizing KPIs into these types, you can align your measurements with the strategic goals of your CoE and your organization.

Examples of KPIs for observability

Now, let's take a look at some specific examples of OaaS KPIs, explaining their purpose, how to calculate them, and some practical “pro-tips” based on my experience.

Click here to expand

Taking the next steps

Now that you’ve explored the critical role KPIs play in defining and measuring Observability as a Service (OaaS), it’s time to put these ideas into action. Here's your call to action:

Start collecting metrics

Begin gathering data for the KPIs we’ve discussed, even if it’s as simple as plugging them into a spreadsheet. This initial step will help your tools administration teams to:

  1. Understand the type of information you’ll be requesting.
  2. Think of systemic, programmatic ways to retrieve this data leveraging APIs, automated reports, or other integrations.

Set your first CoE OKR

Make your initial objective simple and actionable. For example:

Leverage metrics in executive updates

Use the outcomes from this exercise to enhance your Observability CoE’s monthly updates with your executive champion. Highlight early wins, gaps, and actionable insights to build momentum and alignment.

Create achievable goals based on data

Once you’ve established baseline data, use it to define meaningful and attainable goals. For example:

Stay tuned for what’s next

In upcoming blogs, we’ll explore deeper aspects of creating a leading observability practice, including tools inventory, rationalization, and strategies for streamlining your observability ecosystem.

Observability resources, from experts

If you’re passionate about learning about observability, I’d encourage you to:

Series: Splunk for Observability Engineers

Related Articles

What the North Pole Can Teach Us About Digital Resilience
Observability
3 Minute Read

What the North Pole Can Teach Us About Digital Resilience

Discover North Pole lessons for digital resilience. Prioritise operations, just like the reliable Santa Tracker, for guaranteed outcomes. Explore our dashboards for deeper insights!
The Next Step in your Metric Data Optimization Starts Now
Observability
6 Minute Read

The Next Step in your Metric Data Optimization Starts Now

We're excited to introduce Dimension Utilization, designed to tackle the often-hidden culprit of escalating costs and data bloat – high-cardinality dimensions.
How to Manage Planned Downtime the Right Way, with Synthetics
Observability
6 Minute Read

How to Manage Planned Downtime the Right Way, with Synthetics

Planned downtime management ensures clean synthetic tests and meaningful signals during environment changes. Manage downtime the right way, with synthetics.
Smart Alerting for Reliable Synthetics: Tune for Signal, Not Noise
Observability
7 Minute Read

Smart Alerting for Reliable Synthetics: Tune for Signal, Not Noise

Smart alerting is the way to get reliable signals from your synthetic tests. Learn how to set up and use smart alerts for better synthetic signaling.
How To Choose the Best Synthetic Test Locations
Observability
6 Minute Read

How To Choose the Best Synthetic Test Locations

Running all your synthetic tests from one region? Discover why location matters and how the right test regions reveal true customer experience.
Advanced Network Traffic Analysis with Splunk and Isovalent
Observability
6 Minute Read

Advanced Network Traffic Analysis with Splunk and Isovalent

Splunk and Isovalent are redefining network visibility with eBPF-powered insights.
Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud
Observability
4 Minute Read

Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud

Learn more about how AI Agents in Observability Cloud can help you and your teams troubleshoot, identify root cause, and remediate issues faster.
Instrument OpenTelemetry for Non-Kubernetes Environments in One Simple Step
Observability
2 Minute Read

Instrument OpenTelemetry for Non-Kubernetes Environments in One Simple Step

The OpenTelemetry Injector makes implementation incredibly easy and expands OpenTelemetry's reach and ease of use for organizations with diverse infrastructure.
Resolve Database Performance Issues Faster With Splunk Database Monitoring
Observability
3 Minute Read

Resolve Database Performance Issues Faster With Splunk Database Monitoring

Introducing Splunk Database Monitoring, which helps you identify and resolve slow, inefficient queries; correlate application issues to specific queries for faster root cause analysis; and accelerate fixes with AI-powered recommendations.