By Alicia Dale
When thinking about observability, it’s important to remember that applications produce enormous amounts of data in the form of logs, traces, and metrics. All that data can make it difficult to identify what you actually need to do to troubleshoot an ongoing issue or start an investigation. When you’re setting up tracing for your applications, tagging can help tremendously. Splunk’s APM NoSample™ trace retention and tag analysis takes that a step further by enabling you to break down traces by their tags, which helps you troubleshoot issues faster than ever before.
Splunk APM (Application Performance Monitoring) is an essential tool that helps you monitor and optimize the performance of your applications. APM has become a standard tool since the adoption of microservices. When the shift to microservices began, and organizations decided to break up their monolithic architectures into smaller components, it became apparent that all applications needed to have detailed monitoring and tracing to allow for better troubleshooting practices.
What Makes Splunk APM Stand Out Above the Rest?
Splunk APM has a unique architecture called NoSample™, which leverages OpenTelemetry-based instrumentation to ingest ALL observability data, metrics, and traces with logs to come. OpenTelemetry is an open source observability framework that offers vendor-agnostic APIs and SDK’s (Software Development Kits) as well as the OpenTelemetry Collector for collecting telemetry data from cloud-native applications.
Traditional APM products only provide you with a sampling of your tracing data, which is a huge problem because that doesn’t give you a full view into the issue that you need to troubleshoot. Splunk APM outshines these traditional tools due its NoSample™ architecture, which will analyze 100% of the transactions from your microservices.
Splunk APM will fully capture the following from your systems:
- Traces and Errors
- Anomalies and Outliers that are determined by establishing baselines against your data
Limitations of Other APM Tools
If you’re using Splunk APM, then you have all the tracing and information you'll ever need to troubleshoot your applications. But wait...there's more! With Splunk APM, you have infinite cardinality, which means that you have millions of dimensions available to you. This allows for an unprecedented level of visibility into your application that you simply cannot get with any other APM tool on the market. Since you have infinite cardinality for your applications, you can add all the tags you want.
Application Performance Monitoring
Splunk comes equipped with a default Troubleshooting MetricSet where APM will automatically index span tags for the following (which can’t be modified):
- HTTP Method
With the default span tags in place, the application will run a cardinality analysis to calculate the approximate total cardinality contribution after indexing the span tag.
To better represent monitoring for microservices, you can employ the RED method. Using this method, you can monitor the following for each resource:
- Rate (the number of requests per second)
- Errors (the number of those requests that are failing)
- Duration (the amount of time that it takes to complete those requests)
If you’ve read Google’s SRE book or you’re familiar with observability as an SRE or DevOps engineer, then this observability standard may sound familiar to you; indeed, the RED method is quite similar to Google Golden Signals. These metrics are very important, since they’re clear indicators of customer happiness. If your service has a high error rate, then you can expect to have unhappy customers who are receiving page load errors, which can lead them to consider using a different platform for their needs. It can also be problematic if a page takes too long to load, since customers expect a fast load time. Therefore, you shouldn’t overlook RED metrics when you’re setting up APM for your microservices.
Utilizing Tag Spotlight
Root Cause Analysis with Tag Spotlight
Splunk’s Tag Spotlight features a RED metrics time-series chart that displays the total number of requests, errors, root-cause errors, and latency for a specified time range. This can help you tremendously with root-cause analysis for your services by highlighting these critical metrics.
For example, if your service is spewing errors left and right, you’ll want to get to the bottom of the issue as quickly as possible. You can learn how to pinpoint the root cause by following these steps:
- Go to the APM page in the application.
- Select your service of interest from the Troubleshooting tab.
- Click Tag Spotlight in the Requests and Errors service card.
- Using the RED metrics chart, click and drag the cursor over the spike in errors. This will allow you to view only the data related to the incident. From here, you can begin to draw correlations with the errors that you see to help you find the root cause.
When it comes to APM platforms, Splunk has it all – from the ability to analyze 100% of your transactions using NoSample™ to the option to create as many tags as you would like with infinite cardinality. The process of troubleshooting and pinpointing root cause is simplified by using RED metrics along with Tag Spotlight, which allows for easy correlation between your application’s rate, error, and duration metrics. Splunk APM will give you a better understanding of the impact of performance issues on your customers. Check out the Splunk documentation to learn more about Tag Spotlight, NoSample™ and infinite cardinality, as well as how they can help you and your organization. Watch this demo to learn more why Splunk APM is an integral part of the Splunk Observability Cloud. Next, sign up for your free trial to see first hand what enterprise grade, analytics powered troubleshooting means when providing full stack end-to-end visibility for your teams.