From Chaos to Clarity: Managing Metrics at Scale in Splunk Observability Cloud

Before joining Splunk, I had the opportunity to lead observability operations for several Fortune 500 companies. During that time, we shifted from traditional monitoring vendors of the time to more agile, best-of-breed, often niche observability solutions. This shift gave our teams flexibility to move fast and instrument what mattered, via self-service.

One of those tools (not to be named) became our core metrics engine. Self-service adoption took off. Developers and SREs had visibility like never before.

But then the telemetry sprawl hit!

Our metric volume exploded. Cardinality was through the roof. And worse, we had no insight into which metrics were actually useful. When we asked for help, the vendor’s response was an “opportunity” to renew our annual contract early. We knew we had a problem, but we had no way to clearly identify it — let alone fix it — without risking critical coverage.

That’s exactly the kind of problem Splunk Observability Cloud’s Metrics Management capabilities are built to solve.

What and Why: Metrics Management in Splunk Observability Cloud

In modern observability, more data isn’t always better. As SRE, DevOps, and platform teams adopt self-service instrumentation, metric volume tends to grow unchecked. Add in the growing complexity of today’s IT environments — distributed architectures, ephemeral infrastructure, and multi-cloud sprawl — and it’s easy for telemetry to become overwhelming.

The result? Noise, confusion, and surprise overages that are tough to trace back to specific sources.Most organizations struggle to answer basic questions like:

What metrics are we collecting?
Who’s sending them?
Are they still being used in alerts or dashboards?
Can we clean them up without breaking something?

Splunk Observability Cloud’s Metrics Management capabilities helps you answer these questions and take back control. And the best part? You can do it all centrally, without editing collector configs or backend systems.

With Metrics Management, you can:

See which metrics are driving time series volume across your deployment
Understand how those metrics are used, who owns them, and where they show up
Act through a simple point-and-click interface to drop, archive, or aggregate metrics with confidence

Overview: Metrics Management at a Glance

Capability

What it Does

How to Use it / When It Helps

Category

Usage Analytics

Centralized and filterable view of metrics across your entire deployment, allowing you to perform criteria-based deep dives to optimize

Identify unused, redundant, or high-cardinality metrics and zero in on optimization opportunities

SEE

Metric Profile View

Deep dive into a specific metric’s context. This includes dimensions, data sources (tokens), associated charts and alerts

Pinpoint ownership, analyze cardinality drivers, and assess potential blast radius before making changes to metrics

UNDERSTAND

Metrics Pipeline Management (MPM)

You’ve found the opportunity and clearly understand what to do. Pipeline Management provides a point-and-click UI to drop, archive, or aggregate with confidence

Execute metric cleanup centrally without touching collector configs.

ACT

In this article we’ll double-click into each of the capabilities with a quick overview and some practical guidance on how you might leverage it.

See the Invisible with Usage Analytics

The Usage Analytics view shows:

Trends in average Metric Time Series (MTS) over time. Allowing you to visually identify spikes and or trends of your metric volume
Top metrics by usage volume.
A searchable table that includes utilization status, utility score, billing class, and % of total usage.

This is your source of truth for what’s being ingested, used, or just wasting space.

Here is a detailed overview of the fields in the table

Field

Description

Metric name

The name of the metric.

Billing class

Class of metric for billing purposes (host, billing, or custom). To learn more about billing classes, see Metric categories.

Utilization

Whether the metric is used. “Unused” indicates that the metric is producing MTS, but these values aren’t utilized in Splunk Observability Cloud.

Utility score

Indicates how much the metric is used. A high utility score means higher usage.

Metric time series (MTS)

The average number of MTS associated with this metric, measured per hour.

Percentage of total

How much of your total usage plan this metric utilizes.

Pro Tips:

Filter by “Unused” then sort by “percentage over total” to find metrics that are generating volume but aren’t used in any dashboards or alerts.
Sort by utility score to identify metrics that are high-value vs. low-impact.
Filter “Billing Class=custom” to wrangle in your Custom Metrics
Export the table and share with platform owners to start a telemetry review cycle.

Understand: Get the Full Picture with Metric Profile View

Clicking into a metric, within the “Usage Analytics” dashboard, opens the Metric Profile, where you’ll find:

Dimensions contributing to the metric time series count (great for spotting high-cardinality fields).
Tokens sending the metric. Token identification is helpful for identifying ownership.
Charts & Detectors that depend on the metric.

The Metric Profile view allows you to better understand what is contributing to the MTS count (often dimensions) and where the metric is being leveraged in your deployment. This allows you to make data driven optimizations, such as leveraging Metrics Pipeline management to archive, aggregate, and/or drop metrics.

Here is a detailed overview of the fields in the table

Field

Description

Dimensions

Displays the dimension name of each metric sorted by average hourly MTS count. High-cardinality dimensions appear at the top of the list.

Tokens

Displays the token name and ID for each metric, sorted by the number of metric time series associated with the token.

Charts

Displays the charts and dashboards associated with each of your metrics, as well as the user who last updated the chart and the time they updated it.

Detectors

Displays the detectors associated with each of your metrics, as well as the user who last updated the detector and the time they updated it.

Pro Tips:

Target noisy dimensions like container_id, host, or other “id” dimensions as they can be prime for aggregation.
Leverage Token Insights to trace metrics back to environments, apps, or teams. Ideally you are assigning tokens to specific applications, users, and/or use cases; This allows you to engage the right teams and/or disable tokens, to eliminate unnecessary metric sprawl (if needed)
Check Dashboards/Alerts before changing anything, to avoid breaking critical monitors or visualizations.

Again, this is the context you need to make confident, data-driven decisions about what to optimize or keep.

Act: Reduce Waste Without Breaking Things with Metrics Pipeline Management

Once you know what needs cleanup, Metrics Pipeline Management (MPM) gives you the tools to do it easily. Quickly start by clicking the blue “Create Rule” button from the metric profile page.

With MPM’s point-and-click interface, you can:

Ingest and keep metrics in real time (default) These metrics remain available for use in charts and detectors.
Send data to archived metrics Archived metrics are not available in charts or detectors, but you can restore them for up to 31 days or reroute specific subsets back to real-time when needed.
Drop your metrics entirely Dropped metrics are not stored or available for monitoring, but aggregated MTS derived from them can still be retained.
Aggregate metrics to reduce high-cardinality volume Use aggregation rules to group and roll up MTS by meaningful dimensions (like region or service), dropping noisy ones (like container_id). This dramatically reduces MTS volume while preserving insight.

In the screenshot, a simple rule reduces raw MTS by 66% just by removing an overly verbose dimension.

Pro Tips:

Start with unused metrics Optimization in this area is typically low risk and high reward.
Archive before you drop – metrics are restorable for up to 8 days.
Use routing exception rules to keep critical MTS in real-time while cleaning up the rest, for metrics with high cardinality
Preview and validate impact/outcomes within the UI before saving to validate reductions.
Use aggregation to keep what matters: Start by identifying which dimensions actually provide value (e.g., region, endpoint, service) and aggregate away noisy or high-cardinality ones like container_id or host. You’ll retain meaningful insights while dramatically reducing your MTS footprint.

This is where savings happen. MPM gives you the ability to optimize metrics on your terms, with full visibility and control.

Wrapping Things Up

Metric sprawl can sneak up on even the most mature teams, especially when SREs, DevOps, and platform teams fully embrace self-service. What starts as healthy adoption can quickly turn into a tangle of unused data, rising bills, and unclear ownership.

And with the ongoing explosion of tools, services, and telemetry sources across increasingly complex environments, there’s only more data coming. The challenge isn’t just about scale, it’s about (re)gaining control before things .

Splunk’s Metrics Management gives you the tools to fight back. These capabilities provide visibility into what’s being collected, clarity on what matters, and a simple interface to take action when things get out of hand or when opportunities for optimization arise.

Ready to Get Started? To determine your current metrics utilization, understand how/if they are being used, and centrally optimize them via pipeline management, follow these simple steps!

Go to Settings > Metrics Management in Splunk Observability Cloud
Use Usage Analytics to identify unused or high-volume metrics
Drill into any metric, then use Pipeline Management to drop, archive, or aggregate with confidence

Need additional help? Check out the official docs or connect with your Splunk account team. We're happy to guide you through it.

Buried in Metrics Sprawl and Struggling with Overages?

If you're facing growing costs, unclear metric usage, and no easy way to optimize, you're not alone. Start your 14-day free trial of Splunk Observability Cloud today and experience how easy it is to take back control of your metrics.

Splunk offers modern observability solutions

Looking for a platform that delivers these must-have features? Splunk Observability Cloud is a leading modern observability platform that supports end-to-end visibility and enables self-service observability across the enterprise.

Check out this Splunk Tech Talk that shows these concepts in action:

Video
https://www.youtube.com/embed/Ewdkp2lYhzA?si=FiCy9_e_NtHKujB1

Style

two-column

What the North Pole Can Teach Us About Digital Resilience

Observability

3 Minute Read

What the North Pole Can Teach Us About Digital Resilience

Discover North Pole lessons for digital resilience. Prioritise operations, just like the reliable Santa Tracker, for guaranteed outcomes. Explore our dashboards for deeper insights!

The Next Step in your Metric Data Optimization Starts Now

Observability

6 Minute Read

The Next Step in your Metric Data Optimization Starts Now

We're excited to introduce Dimension Utilization, designed to tackle the often-hidden culprit of escalating costs and data bloat – high-cardinality dimensions.

How to Manage Planned Downtime the Right Way, with Synthetics

Observability

6 Minute Read

How to Manage Planned Downtime the Right Way, with Synthetics

Planned downtime management ensures clean synthetic tests and meaningful signals during environment changes. Manage downtime the right way, with synthetics.

Smart Alerting for Reliable Synthetics: Tune for Signal, Not Noise

Observability

7 Minute Read

Smart Alerting for Reliable Synthetics: Tune for Signal, Not Noise

Smart alerting is the way to get reliable signals from your synthetic tests. Learn how to set up and use smart alerts for better synthetic signaling.

How To Choose the Best Synthetic Test Locations

Observability

6 Minute Read

How To Choose the Best Synthetic Test Locations

Running all your synthetic tests from one region? Discover why location matters and how the right test regions reveal true customer experience.

Advanced Network Traffic Analysis with Splunk and Isovalent

Observability

6 Minute Read

Advanced Network Traffic Analysis with Splunk and Isovalent

Splunk and Isovalent are redefining network visibility with eBPF-powered insights.

Observability

4 Minute Read

Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud

Learn more about how AI Agents in Observability Cloud can help you and your teams troubleshoot, identify root cause, and remediate issues faster.

Instrument OpenTelemetry for Non-Kubernetes Environments in One Simple Step

Observability

2 Minute Read

Instrument OpenTelemetry for Non-Kubernetes Environments in One Simple Step

The OpenTelemetry Injector makes implementation incredibly easy and expands OpenTelemetry's reach and ease of use for organizations with diverse infrastructure.

Resolve Database Performance Issues Faster With Splunk Database Monitoring

Observability

3 Minute Read

Resolve Database Performance Issues Faster With Splunk Database Monitoring

Introducing Splunk Database Monitoring, which helps you identify and resolve slow, inefficient queries; correlate application issues to specific queries for faster root cause analysis; and accelerate fixes with AI-powered recommendations.

/en_us/blog/fragments/about-splunk

/en_us/blog/fragments/subscribe-footer

From Chaos to Clarity: Managing Metrics at Scale in Splunk Observability Cloud

What and Why: Metrics Management in Splunk Observability Cloud

Overview: Metrics Management at a Glance

See the Invisible with Usage Analytics

Understand: Get the Full Picture with Metric Profile View

Act: Reduce Waste Without Breaking Things with Metrics Pipeline Management

Value add: Aggregate metrics to reduce high-cardinality volume

Wrapping Things Up

Buried in Metrics Sprawl and Struggling with Overages?

Splunk offers modern observability solutions

Related Articles