Observability

January 31, 2025

9 Minute Read

Rationalize and Realize: Simplifying Tools for a Leading Observability Practice

By Mike Simon

If you’re familiar with my background, you know I’ve had the opportunity to lead and mature observability and monitoring strategies for several Fortune 500 organizations. A constant challenge in these roles has been managing the growing number of tools — often driven by acquisitions, evolving business needs, or siloed decision-making.

Simply put, tools rationalization is about reducing the number of tools in play when necessary but, more importantly, optimizing the value those tools provide. By aligning observability practices with business goals and eliminating inefficiencies, rationalization becomes the foundation for building a successful and sustainable observability strategy.

Recently, my colleague Craig Robin and I hosted a webinar titled "Simplify to Scale," where we touched on the broader topic of tools consolidation and rationalization. If you haven't had the chance to watch it yet, you should check it out here. In addition, Craig has recently dropped an article on why tool sprawl is an executive crisis and how bold leadership, strategic alignment, and a powerhouse steering committee can turn consolidation into a driver of innovation, efficiency, and resilience.

In this blog, we’ll focus specifically on observability tools rationalization. We’ll clarify how Total Cost of Ownership (TCO) reveals the impact of tool sprawl, introduce a three-part framework for rationalization, and discuss how to identify your golden set of tools to align with business goals and streamline operations. Let’s dive in!

Breaking down total cost of ownership (TCO) for tools

When it comes to the impact of tool sprawl, it boils down to cost. Whether the conversation is about inefficiencies, redundancies, or operational challenges, these issues ultimately translate into expenses that affect your bottom line.

Understanding the TCO provides critical insights into these costs and how they compound across your toolset. TCO is made up of three primary components:

Acquisition costs
Operational costs
Indirect (or hidden) costs

Each category of TCO reveals inefficiencies and highlights opportunities for improvement, making it an essential framework for tools rationalization.

Acquisition costs

Acquisition costs are the upfront, visible expenses of bringing tools into your environment. Licensing fees often get the most attention, but they’re only part of the equation. Infrastructure needs — particularly in on-premises or hybrid setups — can quickly add up, especially when you consider the immense amount of data generated by observability tools.

To support observability and monitoring solutions, I've seen IT teams in many organizations maintain:

Large storage arrays
Virtualization infrastructure
Vendor-specific databases like Microsoft SQL or Oracle

The scale of this infrastructure became a significant cost factor, especially when we accounted for high availability (HA) and disaster recovery (DR) requirements. Meeting these requirements often doubled acquisition costs, as redundancy and failover systems had to be built into the architecture.

Operational costs

Once tools are deployed, the ongoing operational expenses begin to accumulate. These include maintenance, updates, and ensuring the tools remain effective for the teams using them.

Keeping solutions up to date is essential, as updates and patches often include critical features, bug fixes, and, most importantly, security enhancements. (Whether your teams are actually performing these updates is part of the next section.)

People and labor resources are a major driver of operational costs, with IT, engineering, or platform teams dedicating significant time to activities like:

Applying updates
Integrating tools into workflows
Creating dashboards/alerts

For organizations early in their observability or platform engineering journey, these tasks can be challenging, particularly when dealing with fragmented or overly complex toolsets. Understanding the labor costs tied to these operational activities is critical. I’ve seen how distributed tools management, often neglected or treated as a secondary responsibility, can:

Dilute confidence in solutions
Increase complexity and indirect costs, such as prolonged MTTR

Indirect (or hidden) costs

Indirect costs are some of the hardest to quantify but often have the most significant long-term impact. These hidden costs — downtime, increased MTTR, integration complexity, productivity loss — amplify the challenges of tool sprawl and directly tie back to acquisition and operational costs.

Fragmented tools delay incident resolution, as teams spend valuable time switching between platforms and manually piecing together data. This leads to prolonged downtime and higher MTTR, directly affecting revenue, productivity, and customer trust. Poorly maintained or outdated tools (as highlighted above) only increase the likelihood of these disruptions.

Integration complexity adds further challenges. Here are some examples:

Managing disparate tools often requires significant effort to ensure workflows, such as incident response and troubleshooting, function smoothly.
Transferring data between tools can be especially problematic, adding friction during critical workflows like diagnosing an outage or identifying the root cause of an issue.
In some cases, the tools themselves may not even support exporting or integrating their data, further complicating efforts to create a cohesive observability strategy.

These inefficiencies increase operational overhead, delay resolution times, and place additional strain on teams already managing complex environments.

By addressing these hidden costs through rationalization, organizations can reduce downtime, streamline workflows, and free up resources to focus on strategic, high-value initiatives.

Tools rationalization framework: How to rationalize tools

A structured framework is critical to making tools rationalization effective and sustainable. Without a clear approach, efforts can stall or lead to unintended gaps in functionality. A practical framework for success includes the following three key steps:

Step 1. Inventory and baseline

Start by creating a comprehensive inventory of all observability tools in use across your organization. This inventory serves as the foundation for identifying opportunities to simplify and rationalize your toolset. At a minimum, capture:

Basic details: Tool name, acronym, URL, tool/agent versioning, owner, and observability capability (e.g., infrastructure monitoring, synthetics, APM, RUM).
MELT coverage: Metrics, Events, Logs, and Traces — detailing which pillars each tool supports.
Resource information: Number of trained resources using, supporting/maintaining the tool and integrations with other tools or systems.
Contract/licensing details: Contract type, renewal date, etc
Metrics to support OaaS KPI/OKRs: Leverage the right KPI/OKRs to measure your Observability-as-a-Service offering.

Building this inventory can be challenging, especially with the prevalence of shadow IT, where tools may be adopted without formal approval. Start with the obvious or widely visible tools in your environment and expand from there:

Engage teams directly: As you speak with teams, ask targeted questions such as, "What do you use for Metrics, Logs, and/or Traces?" This often surfaces tools or agents that might not be officially documented.
Work with procurement: Identify vendors commonly associated with observability tools by reviewing procurement records and payment histories.
Monitor alerts: Review email sender analytics for common vendor domains or alerting systems. Noisy tools with poorly managed alerts are often a strong indicator of distributed or fragmented tools.
Technical discovery: Leverage host data to identify agents running in process lists. This can reveal tools that are active in the environment but may not be well-known to your broader organization.

This inventory provides a clear baseline of your current toolset and also highlights redundancies, underutilized tools, and areas of misalignment. This directly informs rationalization strategies, allowing you to prioritize tools that align with business goals and identify candidates for decommissioning or consolidation.

Step 2. Requirements gathering and alignment

Engage stakeholders to ensure your rationalization efforts align with business and operational goals. Stakeholder input helps identify the tools and capabilities that truly matter while highlighting gaps or redundancies. Key areas to focus on include:

Retention requirements: Understand the organization's data retention policies for metrics, logs, and traces. Tools must support these policies to avoid compliance gaps or operational risks.
Business workflows: Map workflows and processes to determine which tools directly support critical business operations, such as customer-facing applications or high-priority services.
Functional and non-functional needs: Define the required capabilities (e.g., APM, RUM) and ensure tools meet performance requirements like scalability, reliability, and latency thresholds.
Tactical and strategic needs: Assess where your organization stands in its modernization journey. Are you migrating critical workflows and systems to cloud-native and microservices architectures? Do you still rely on traditional three-tier monolith applications?

Taking the time to gather and align these requirements ensures that rationalization efforts remain strategic, eliminating unnecessary tools while focusing on those that deliver measurable value. This step also builds stakeholder buy-in by addressing both technical and business needs upfront.

Step 3. Prioritize and execute

Start by ranking tools based on their:

Alignment with organizational goals
Cost-effectiveness
Ease of integration

Use this prioritization to develop a phased roadmap, ensuring progress is iterative and minimally disruptive to day-to-day operations.

Tools rationalization strategies

While there isn’t a one-size-fits-all solution, I’ve ordered these common rationalization strategies based on what I’ve seen work best in practice. Many organizations implement a combination of these approaches to optimize their observability tooling effectively:

Low-hanging fruit: Begin with tools that are redundant, outdated, or underutilized. Getting rid of these tools are the easiest wins because they demonstrate immediate value and help build momentum for the broader rationalization effort.
Overlap reduction: Consolidate tools with overlapping capabilities to simplify workflows and reduce resource demands. This is a natural next step once the low-hanging fruit is addressed.
Strategic alignment: Eliminate tools that do not align with long-term business goals to ensure focus remains on critical initiatives. This is particularly important when shifting to new architectures, such as cloud-native or microservices.
Business value focus: Prioritize tools that deliver measurable business value and rationalize those that contribute little or no value. This requires a thorough understanding of how each tool supports key workflows and operational needs.
Cost-benefit analysis: Evaluate tools based on their return on investment (ROI), balancing cost against their operational and strategic contributions. This approach can help refine decisions when budget constraints are a significant factor.

By applying these rationalization approaches, either sequentially or in parallel, you can streamline your toolset, reduce unnecessary complexity, and position your observability practice for long-term success. A phased execution plan minimizes disruption, maintains operational continuity, and demonstrates measurable value to stakeholders throughout the process.

How to identify your golden set of tools

A critical outcome of tools rationalization is defining your “golden set” of tools — a streamlined, cohesive set of solutions that meets your organization’s unique business and operational requirements. This golden simplifies your toolset and positions your team to consolidate and migrate data and insights from rationalized tools effectively.

When identifying your golden set, consider these key criteria:

Unified observability capabilities

Select tools that cover the full spectrum of observability needs, including infrastructure monitoring, APM, RUM, synthetic monitoring, and log aggregation, while addressing the core observability pillars — Metrics, Events, Logs, and Traces (MELT).

Unified observability suites that integrate these capabilities into a single platform can significantly reduce operational overhead and hidden costs by streamlining workflows and providing a common framework for teams.

Support for business requirements

Ensure tools meet the unique functional and non-functional requirements identified during the framework process. This includes critical capabilities like:

Self-service observability
Disaster recovery (DR)
High availability (HA)

Tools should empower teams to act independently while maintaining reliability and compliance with organizational policies.

Integration readiness

Prioritize tools that seamlessly integrate with your broader ecosystem and are capable of consolidating data and workflows from rationalized tools. Strong integration support reduces friction, accelerates adoption, and minimizes the complexity of transitioning away from redundant solutions.

Open standards and scalability

Choose tools that support open standards like OpenTelemetry (OTel) to enable vendor-neutral observability and future-proof your stack. Tools should also be scalable to handle current and future workloads without sacrificing performance or reliability.

The golden set is more than just a collection of tools — it’s the backbone of your observability strategy. By focusing on consolidated, business-aligned, and unified solutions, you can simplify operations, reduce complexity, and ensure your observability practice delivers measurable value across the organization.

From cost cutting to value building: Your next steps in observability

If you’re considering leveling up your observability practice, tools rationalization is a powerful way to drive efficiency, reduce costs, and align your strategy with business goals. While rationalization often starts as an exercise to eliminate or reduce spending, it’s equally important to recognize that some of those savings will need to be reinvested.

Optimizing your toolset and creating a leading observability practice requires strategic reinvestment to ensure your golden tools deliver the maximum value. For more insights into building a leading observability practice, check out these related blogs:

Let’s keep the conversation going — streamline your tools, align your strategy, and take your observability practice to the next level!

Observability resources, from experts

If you’re passionate about learning about observability, I’d encourage you to:

Check out our team's observability articles and tutorials on Splunk Community.
Watch our Splunk Observability for Engineers video series. Check out the entire series for more tutorials, insights, and new features and capabilities.

Series: Splunk for Observability Engineers

How to Build a Winning Observability Strategy

Observability Center of Excellence

How To Build O11y CoE

KPIs & OKRs for OaaS

Rationalizing Tools

Tiered Observability

Self-Service Observability

Metadata & Tags for Context

Embed Observability into IT

Monitoring User Journeys

Mike Simon

Mike Simon is a seasoned observability leader and Developer Evangelist at Splunk, with over 16 years of experience in IT operations. Passionate about driving best practices in observability, he has a track record of optimizing monitoring frameworks for several Fortune 500 companies. With expertise spanning AIOps, cloud-native technologies, and digital experience monitoring, Mike is dedicated to empowering organizations to achieve comprehensive observability.

Observability 5 Min Read

A Deep Dive Into Built-In Anomaly Detection: How the Algorithm Works

Discover how Built-in Alert Conditions and Alert Preview in Splunk Infrastructure Monitoring allow cloud operations to exploit the full power of our real-time analytics engine in a way that is both intuitive and flexible.

Observability 4 Min Read

Data Drivers: Hands-on The Wheel and The Data

The race is on – discover how Splunk's powerful capabilities for IT and AIOps can be applied to motorsports with our new interactive series, Data Drivers, a fun and enlightening journey into bringing Splunk's Data-to-Everything Platform to racing.

Observability 8 Min Read

Monitoring with Logs: Metrics from AWS FireLens, Splunk and Logstash

SignalFx is advancing its observability capabilities with the introduction of log metricization by way of an official integration with FireLens, the new log aggregation service from AWS.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram