Network performance monitoring (NPM) and application performance monitoring (APM) are both key pillars of an overall performance and reliability management strategy, especially when dealing with complex, distributed infrastructure across cloud-native environments. NPM and APM also complement each other, in the sense that NPM can serve as an additional source of truth and observability for application performance.
Although there is some overlap in the tools and methodologies behind NPM and APM, they’re distinct processes that focus on different data sources and metrics.
Let’s take a look at how NPM and APM compare and where they both fit within a performance management strategy tailored to cloud-native environments.
What is network performance monitoring?
Network monitoring means monitoring networks for trends or signs of problems. Using techniques like packet capturing and streaming telemetry, NPM can measure information like:
- Bandwidth utilization
- How hosts are distributed across the network
- How loads are being balanced across different application instances running on the same network
Traditional on-premises network performance monitoring was relatively straightforward in that most environments included just one network to monitor. In cloud-native environments, however, there are often multiple internal networks to monitor, as well as at least one public-facing network interface that is accessible to any hosts on the network.
In addition, because network configurations change constantly as container IP addresses are updated, load balancers redirect packets and traffic flows change, cloud-native network data lacks the contextual information that comes with traditional, packet-focused networking monitoring. This makes the ability to observe data flows between source and destination services is especially critical in distributed environments.
In other words, teams need fine-grained, low-level visibility into all network traffic — internal as well as public-facing — that flows within or between parts of an application, containers, microservices, processes, and users. Capturing information from every connection and every process is the only way to understand how the complex traffic flows within a distributed environment add up to overall network performance.
In all these ways, NPM for distributed environments very much differs from NPM for monoliths.
What is application performance monitoring?
Application performance monitoring refers to the monitoring of applications for signs of performance issues. APM typically focuses on data such as:
- Error rates
- Response times
- Uptime for applications
APM may also include monitoring of how much CPU, memory, and other resources an application consumes, and how those metrics change over time.
(Read about more APM metrics.)
Like NPM, APM was simpler in the days of monolithic applications. In the cloud-native, microservices-oriented world, APM is more challenging not only because there are more services to monitor, but also because effective management requires the ability to correlate data from each individual service in ways that deliver visibility into the overall state of the system.
Additionally, in a microservices environment, it’s critical to be able to analyze all transactions, rather than just sampling some and extrapolating from there. Complete trace and span data enables high cardinality, AI-directed troubleshooting. And in order to understand how performance trends impact the end user, modern APM must be able to use no sample, full fidelity ingestion of all front-end traces to track how backend applications components interact with frontend services for every transaction.
Similarities in NPM and APM
The similarities between NPM and APM are relatively obvious: both types of monitoring provide insights that can help teams anticipate problems that may have negative consequences for the user experience. An application that becomes unavailable due to a load-balancing problem on the network is just as bad from the user’s perspective as one that fails because it runs out of memory or becomes overloaded with requests.
There is also some overlap in the type of metrics that NPM and APM focus on. For example, the golden signals of monitoring — latency, error, traffic and saturation — are important data points for understanding the health of both your network and your application.
NPM vs. APM: Differences
Beyond this, however, there is little common ground between NPM and APM. Despite the partial overlap in metrics described above, most metrics are unique to one type of monitoring or the other.
- APM focuses on metrics such as request rates, error rates and round-trip latency.
- NPM in a cloud-native environment measures data like packet retransmission rates and connection errors.
The way you collect NPM and APM data is also different. Modern approaches to NPM rely on techniques like using eBPF at the operating system level in order to collect network data that would otherwise be impossible to “see,” especially in cloud environments. In this way, site reliability engineers (SREs) can trace traffic flow between microservices within a distributed system in order to determine whether the network is the cause of a degradation in application performance.
In contrast, APM focuses on traces and scans, especially transaction traces and other data that is collected directly from a running environment using tools that peer inside the application as it processes requests.
Achieving end-to-end visibility
Because NPM and APM deliver visibility into different parts of your environment, they are not an either/or proposition. You need both in order to gain a full understanding of what is happening in your environment.
As noted above, critical disruptions or degradations to the end-user experience can be caused by faults in either the network or the application. Both types of monitoring are necessary to safeguard against these risks.
What’s more, the ability to correlate network data and application performance data is often crucial for understanding the root cause of an issue. For example:
- If your application’s response rate takes a dive, and you also notice problems in the network that connects two microservices within the application, then it’s likely that the network is the root cause of the issue.
- Or, if you detect an application performance issue but your network performance monitoring reveals no problems, you’ll know that the root cause almost certainly lies with the application — not the network.
Network monitoring supports observability
NPM and APM serve as crucial complements to each other, with network monitoring offering an additional source of observability that can help to contextualize application performance trends and pinpoint the source of problems, even if those problems don’t stem from the network itself.
So, while NPM and APM may be somewhat useful individually, they deliver the greatest value when they are used in tandem as part of an end-to-end observability strategy that takes data from all layers and resources within your environment and allows you to compare and correlate relevant trends within it. That’s how you achieve true visibility, especially into complex cloud-native environments where the root cause of surface-level problems is rarely obvious from one type of monitoring alone.
What is Splunk?
Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure, and networking. He is Senior Editor of Content and a DevOps Analyst at Fixate IO. His book For Fun and Profit: A History of the Free and Open Source Software Revolution was published in 2017.
This posting does not necessarily represent Splunk's position, strategies or opinion.