Developing modern applications is harder than ever, with microservices and cloud deployment models making it harder to get things working than ever before. However, anyone who’s deployed an application knows that that’s just the beginning of the work. The biggest part comes later: ensuring it works correctly, with maximum efficiency and great performance. Most importantly, when things go wrong—and I assure you they will—you must detect and fix the issues as quickly as humanly possible. All of that has a name: Application Performance Monitoring, or APM. Let’s dive into what APM actually means.
APM is the process of using specialized tooling to monitor how your application performs in production. With the right set of features, you can ensure problems are detected and fixed ASAP, resulting in less downtime and more satisfied users.
Fortunately, there has never been a better time to invest in APM than now. Besides having a cornucopia of APM tools to choose from, the barrier of entry has been lowered since the arrival of OpenTelemetry.
In this post, we’ll walk through some of the main APM tools out there, highlighting the features and advantages of each one.
What We Mean by “Healthy”
As you’ve seen, APM tools are essential to ensuring application health. By what exactly do we mean by saying an application is healthy? Let’s explain that now.
Know When There Are Failures
Having a healthy application most simply means that the application does not fail. This isn’t realistic, so healthy applications are instrumented to tell you when (and how) they are failing.
Make Availability Visible
Your app can’t be healthy unless you have constant, up-to-date data about availability. That goes way beyond a simple ping to check that the app is up. Availability must include checks that validate that the crucial workflows on the app are working as intended.
Most importantly: Ideally, you shouldn’t have to go check for availability or other issues. This takes us to the next point.
Get Notified by Your APM When Something Is Wrong
If a service fails in the woods, and nobody’s around to answer the page, did it actually fail? Yes, and if you don’t notice, your customers certainly will. It’s critical that your APM tool be able to tell you when and where things have failed, rather than simply updating a dashboard. Alerting is a critical part of APM.
So, to have a healthy application, you have to leverage an APM tool that does that heavy lifting and proactively tells you when things go wrong, through the use of notifications and alerts. Relying only on email isn’t enough; nowadays, functionality like Slack integrations is a must. Incident response collaboration tools also make it easier to reduce the mean-time-to-acknowledge an issue, speed up troubleshooting, and shrink war rooms.
Finding Needles in Haystacks at 3 a.m.
We live in the era of cloud computing and distributed systems. Organizations can achieve a degree of availability and performance that older companies wouldn’t dare to dream of.
All of that comes with a price of more complexity, making debugging today’s modern systems like finding a needle in a haystack.
This is where distributed tracing can help. Distributed tracing follows a request (transaction) as it moves between multiple services allowing engineers to help identify where the service request originates from (user-facing frontend application) throughout its journey with other services.
Here are some of the main features of Opsview.
- Dashboards: Monitoring dashboards provides a way to see in real time how several factors are performing in a concise, visual way.
- Autodiscovery: With the auto discovery feature, Opsview allows you to automatically profile hosts in your environment to generate information that Opsview can use for monitoring.
- High Availability: With its high-availability server, Opsview ensures that, if something happens to your Opsview agent, an automatic switch happens and the secondary agent takes its place so you don’t lose observability.
- Business Service Monitoring: Business service monitoring (BSM) is a feature of Opsview that allows you to connect several hosts that support a single business function and monitor them together. That way, you can see how failure in a single individual component can affect the whole system.
Where It Shines
Opsview is easy to setup and maintain. Its extensibility, and is simple to use UI make it easy to find exactly what you are looking for.
Loupe is an APM solution, available on-prem or cloud hosted that targets organizations which leverage .NET and Java applications. Free to try for 30 days they offer three available plans: basic, professional, or enterprise.
The basic plan can be used by up to five users. It includes 2 GB per month, and extra GBs are charged on top of that. The features available include centralized logging and metrics, and web and desktop view logging.
The professional plan starts at 10 GB per month and you can also buy additional ingestion. The number of users is unlimited, and the features include everything basic has plus additional analytics and error management features.
The most advanced plan is enterprise. It includes 50 GB per month, unlimited users, and everything in the professional and basic plans plus priority support, real-time remote log viewing and active directory integration.
Loupe provides a combination of log management and automatic error analysis, to help organizations quickly discover the root cause behind possible application issues.
- Integrations Out of the Box: Loupe integrates easily with many popular tools and services that organizations already use, such as Slack, Jira, HipChat, and more.
- Support for the Main Logging Frameworks: Such as NLog, log4net, Serilog and log4j2.
- Support for Many .NET Technologies: Support for many .NET technologies, such as ASP.NET Web Forms, Entity Framework, and ASP.NET Web Api.
Where it Shines
Loupe is an interesting option for organizations that work primarily with the .NET stack, though Java support was recently introduced. The tool offers a quick setup and plenty of integrations, allowing you to get started without much overhead.
Unlike other items on our list, though, Loupe doesn’t support a large variety of tech stacks, which might be a deal breaker for many organizations.
Stackify Retrace is a solution that integrates code profiling, error tracking, and production monitoring in a single tool.
- Tech Stacks Supported: Stackify Retrace works natively with the major programming languages/tech stacks, such as .NET, Java, PHP, Python, and Ruby.
- Track Deployments: With Stackify Retrace, you can easily track deployments and verify whether the app quality changed after each deployment.
- AppScore: AppScore is Stackify’s proprietary index for assessing user satisfaction.
Where It Shines
Stackify Retraces sets itself apart from other APM solutions delivering a more integrated code profiling experience by offering developers live code profiling. As developers write code, live code profiling helps them understand how to best make their code more efficient and perform the best.
Application Insights, part of Azure Monitor, is Microsoft's APM solution targeted developers and DevOps professionals the most out of their applications with Azure
Here are some main features of Application Insights:
- Smart Detection: It lets you create automatic alerts that adapt to your application patterns. When it detects something wrong, it triggers.
- Profilers: With the profiler activated, you can profile requests on your apps.
- Dashboards: Application Insights provides an overview dashboard to allow quick assessment of your application’s health.
- Live Metrics Stream: With the live metrics stream, you can query and filter indicators from your production system and watch them in real time.
Where It Shines
Being part of Azure Monitor, Application Insights shines by natively integrating with Azure services.
Here’s a nonexhaustive list of Splunk’s features:
- OpenTelemetry Native: Splunk supports OpenTelemetry, which means you don’t run the risk of vendor lock-in.
- Predictive Analytics: Splunk uses the power of AI to provide analytics that can predict and prevent future issues and uncover the root cause of current ones.
- Real-Time Visibility: Splunk enables you to see how your users experience your application and how each code change affects performance.
- Full-Fidelity Distributed Tracing: Splunk is the only vendor using a NoSample™ full fidelity approach to capture and visualize all data, in context, making sure no anomalies get missed.
Where It Shines
Splunk has been described by users as “the only observability tool that could support their modern applications” The solution particularly shines when it comes to its powerful analytics capabilities, which includes both real-time streaming and full-fidelity data, essential for accurate analysis and improving MTTR.
The alerting and monitoring capabilities have also been praised, along with the ease of dashboard creation and the ease of getting started. Finally, since it’s a cloud-based solution, Splunk removes the necessity of organizations to manage their infrastructure when it comes to monitoring, which is certainly a big win.
Best Match for the Best Health
In this post, we’ve walked you through some of the main APM tools at your disposal. Of course, as we’ve mentioned, there are a plethora of APM tools in the market, but the ones in this list will give you a comprehensive view of the variety you might find.
This post was written by Carlos Schults. Carlos is a consultant and software engineer with experience in desktop, web, and mobile development. Though his primary language is C#, he has experience with a number of languages and platforms. His main interests include automated testing, version control, and code quality.