Monitor Cloud-Native & Hybrid Apps and Business Transactions With Observability Cloud APM

This blog was co-authored by Deena Shanghavi and Bob Ni.

As organizations modernize, most applications don’t fit neatly into one category—they span both traditional three-tier architectures and cloud-native microservices. To monitor these hybrid environments effectively, teams need APM tools that can seamlessly connect the two worlds.

That’s why at .conf25, we’re introducing new capabilities in Splunk Observability Cloud to strengthen APM for cloud-native applications and extend support for hybrid environments —building on AppDynamics’ proven expertise in monitoring traditional three-tier applications. These updates bring greater visibility and precision to monitoring, from business transactions down to code-level execution.

Highlights include:

Together, these capabilities deliver a unified APM solution for organizations running hybrid or microservices-centric applications.

Monitor Business Transactions With Precision

Monitoring individual services in isolation makes it difficult to understand how application performance impacts the business or to prioritize where to focus your efforts. You need visibility into the end-to-end business transactions that span multiple services and directly impact your customers.

In Splunk Observability Cloud, a Business Transaction group a set of related traces that track a discrete transaction or user flow of interest. We’ve evolved the former Business Workflow feature in Splunk APM to Business Transactions, introducing a dedicated view and giving you more flexibility in how you define and monitor them. This makes it easier to monitor and troubleshoot key operations such as checkout.

Example: Imagine you’re an SRE for an e-commerce application. Instead of setting up alerts for individual services—which creates a lot of noise—you configure detectors for critical business transactions such as order confirmation. One day, you receive a critical alert about high latency in your order confirmation transaction. Clicking the link in the alert email takes you directly to the overview page for that business transaction.

On the overview page, you see:

By quickly comparing latency across services in the transaction, you notice the most upstream service, ecommerce-green-svc, shows the highest latency. You can now narrow the issue down to that service for deeper troubleshooting.

The Business Transaction overview page tracks transaction-level RED metrics, including latency comparisons across services.

Gain Code-Level Insights With Call Graphs

In APM, tracing helps you see where latency or errors occur across distributed services but it doesn’t always reveal what’s happening inside the code itself. That’s where Call Graphs come in. A Call Graph provides a detailed breakdown of the execution path for a single span of a trace, at the code level.

We’re introducing Call Graphs in Splunk APM to help you go beyond a slow trace and pinpoint the exact function or method inside a service that’s causing the slowdown.

Example (continuing from above): You already know the ecommerce-green-svc service is the source of the problem. You drill into the service-centric view of ecommerce-green-svc to investigate further. After confirming the service is unhealthy and showing elevated latency, you analyze relevant traces. The Traces view automatically surfaces problematic traces for the service, so you click on the longest trace to understand what happened.

The Traces view automatically surfaces problematic traces for the selected service

In the Trace Waterfall View, you see that most of the time was spent in the root span, which contains a call graph (marked with a blue notebook icon). Reviewing the call graph, you find a single method - sun.nio.ch.Net.connect0(Net.java:0)causing the majority of the latency. With this insight, you can work directly with the team that owns this code to resolve the issue.

The call graph provides a detailed breakdown of the execution path.

See Complex Architectures Organized the Way Your Teams Work

In large, distributed environments, hundreds of services often appear as an unstructured list, making it difficult to quickly understand relationships, identify ownership, or connect technical issues back to business domains. Without logical grouping, troubleshooting is slower, dashboards become cluttered, and teams waste time sifting through noise.

To address this, we’re introducing the ability to use indexed span tags such as service.namespace to group related services on the Service Map. This adds structure and context to your observability data, transforming a messy list of services into a clear, business-aligned view of your system.

With service map grouping, you can create conceptual views that align with how your teams work. For example, you might group all checkout-related services together to monitor performance as a unit and see how they affect one another. Because different teams or business units often own different namespaces, grouping also clarifies ownership and reduces noise from unrelated services.

Once services are grouped, you can monitor their health at a glance. A multicolored ring around each group shows the percentage of services in red (critical), orange (warning), or gray (normal) health states. Selecting a service group reveals aggregated request and error metrics for the group, along with a list of the services it contains.

Service map grouping lets you group related services by indexed span tags to align with how your teams work.

Correlate App and Infra Data With Service Instance Visibility

In modern distributed environments, applications don’t run on a single server—they run across hundreds or even thousands of service instances, often in containers or ephemeral infrastructure. Looking only at aggregate service-level metrics can mask critical issues that affect just a subset of instances (for example, a single pod in Kubernetes or a VM in a cluster).

To solve this, we’re introducing a new Instances tab in APM that brings together service instance and infrastructure metrics in a single view so you can pinpoint issues faster.

The Instance tab provides a searchable table of all service instances, showing their request, error, and duration (RED) metrics alongside infrastructure data such as CPU, memory, and host ID. With direct correlation to infrastructure monitoring, you can quickly confirm or rule out whether infrastructure is the cause of application issues.

Monitor Any Applications With Splunk APM Now

With these innovations, Splunk APM delivers best-in-class support for both cloud-native and traditional three-tier applications—giving you one unified platform to monitor any app, in any environment.

Start a free trial or schedule a demo today.

Follow all the conversations coming out of #splunkconf25!

Follow @splunk

Related Articles

What the North Pole Can Teach Us About Digital Resilience
Observability
3 Minute Read

What the North Pole Can Teach Us About Digital Resilience

Discover North Pole lessons for digital resilience. Prioritise operations, just like the reliable Santa Tracker, for guaranteed outcomes. Explore our dashboards for deeper insights!
The Next Step in your Metric Data Optimization Starts Now
Observability
6 Minute Read

The Next Step in your Metric Data Optimization Starts Now

We're excited to introduce Dimension Utilization, designed to tackle the often-hidden culprit of escalating costs and data bloat – high-cardinality dimensions.
How to Manage Planned Downtime the Right Way, with Synthetics
Observability
6 Minute Read

How to Manage Planned Downtime the Right Way, with Synthetics

Planned downtime management ensures clean synthetic tests and meaningful signals during environment changes. Manage downtime the right way, with synthetics.
Smart Alerting for Reliable Synthetics: Tune for Signal, Not Noise
Observability
7 Minute Read

Smart Alerting for Reliable Synthetics: Tune for Signal, Not Noise

Smart alerting is the way to get reliable signals from your synthetic tests. Learn how to set up and use smart alerts for better synthetic signaling.
How To Choose the Best Synthetic Test Locations
Observability
6 Minute Read

How To Choose the Best Synthetic Test Locations

Running all your synthetic tests from one region? Discover why location matters and how the right test regions reveal true customer experience.
Advanced Network Traffic Analysis with Splunk and Isovalent
Observability
6 Minute Read

Advanced Network Traffic Analysis with Splunk and Isovalent

Splunk and Isovalent are redefining network visibility with eBPF-powered insights.
Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud
Observability
4 Minute Read

Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud

Learn more about how AI Agents in Observability Cloud can help you and your teams troubleshoot, identify root cause, and remediate issues faster.
Instrument OpenTelemetry for Non-Kubernetes Environments in One Simple Step
Observability
2 Minute Read

Instrument OpenTelemetry for Non-Kubernetes Environments in One Simple Step

The OpenTelemetry Injector makes implementation incredibly easy and expands OpenTelemetry's reach and ease of use for organizations with diverse infrastructure.
Resolve Database Performance Issues Faster With Splunk Database Monitoring
Observability
3 Minute Read

Resolve Database Performance Issues Faster With Splunk Database Monitoring

Introducing Splunk Database Monitoring, which helps you identify and resolve slow, inefficient queries; correlate application issues to specific queries for faster root cause analysis; and accelerate fixes with AI-powered recommendations.