APM: Not an Infrastructure Monitoring Strategy

Note: On November 14th, 2018, we announced Splunk APM, the first APM solution that combines NoSample™ tail-based distributed tracing with streaming analytics. Splunk APM™ extends our real-time metrics platform by adding distributed tracing capabilities that enable Splunk Infrastructure Monitoring users to monitor the flow of transactions through service-oriented and microservices-based applications, a need that is not addressed by traditional APM solutions. 

For every modern application, infrastructure monitoring that aggregates metrics and focuses on time series analytics is essential to ensuring availability and performance in production. Infrastructure monitoring fills a large gap not previously addressed by APM (or log management): intelligent and timely alerting on service-wide issues and trends across the environment (whether in the cloud or on-prem, or a mix of legacy and new architectures).

Ultimately, the best DevOps strategy requires full visibility not only up and down the stack, but also across all stages of the application lifecycle. APM alone does not provide that, and an effective infrastructure monitoring solution should integrate APM data to provide time correlation with other infrastructure metrics and robust analytics for the most meaningful view of the entire environment.

Developers use an APM solution like New Relic or AppDynamics to instrument their applications and trace performance issues across transactions. However, APM data represents just one subset of information that a modern approach to infrastructure monitoring needs to process. By combining data from APM and several other element managers, a modern infrastructure monitoring solution can aggregate and alert on the metrics flowing directly from the constantly changing population that makes up most elastic, distributed architectures.

The primary objective of APM is to test pre-deployed code against downstream performance issues. Performance engineering with APM allows developers to deploy an agent that simulates the various transactions performed in the execution of code in production.

By tracing through all the steps across the application stack for a single coding language, the team can approximate the time required to complete API calls and component behaviors against a battery of web, mobile, and desktop scenarios. Developers are then able to detect operational problems, bottlenecks, or inconsistencies prior to pushing the code to live.

APM solutions should be used for what they are exceptional at doing: providing transaction traces and identifying bottlenecks in code. They were not designed for monitoring the service-level operations of today’s diverse environments, where several factors outside of your code can create real issues.

Moreover, most APM solutions require proprietary agents that perform byte-code injection. Though such a heavyweight approach might be acceptable in a development environment, most organizations prefer not to endure the expense of running a proprietary agent across the production fleet and choose to sample data from selected nodes for infrastructure monitoring instead.

However, sampling doesn’t provide a reliable view of the production environment’s changing population or specific performance and is, therefore, an insufficient source of content to drive effective alerts.

APM tools help organizations easily instrument and identify bottlenecks in their code. APM vendors focus most of their development resources on the instrumentation part of the problem (e.g., providing the best tracing for Java applications), but have not invested in the downstream analytics, correlation, and alerting required of a general-purpose monitoring solution. Ultimately, they provide another source of insight that is tremendously valuable when combined with other operational data in a complete, modern infrastructure monitoring solution.

With the real-time insight introduced by modern infrastructure monitoring, application developers, infrastructure engineers, and operations teams can collaborate across the entire application lifecycle for the first time, from pre-production performance engineering through real-time service-level monitoring in production to post-mortem investigation of past issues.

Learn more about Splunk APM and get a free demo, here

Posted by


Show All Tags
Show Less Tags