VM Monitoring: A Beginner's Guide

Virtual machines, or VMs, often feel like old news. Your team may think it mastered VM monitoring long ago, and that the challenge today is figuring out how to manage flashier “cloud-native” technologies like serverless functions and containers.

It’s true that VMs have been in widespread production use much longer than most of the technologies we associate with cloud-native computing. But the fact is that even VMs have been transformed by the cloud-native revolution. VMs may still work in the same way that they did in the past, but they serve different purposes and present new challenges.

That’s why it’s important to factor VMs into the equation as you update your performance management and monitoring strategy for the cloud-native era. To explain how, this article walks through the monitoring priorities for VMs today, and the new types of metrics that teams should be measuring to track VM health and performance in cloud-native environments.

Traditional VM Monitoring

Traditionally, VM monitoring was relatively straightforward. It focused mostly on standard resource consumption data, such as CPU usage, disk usage and throughput, and memory usage – the same types of metrics that you would use to measure the health and performance of a bare-metal server.

Beyond this data, the only big difference between managing VMs and managing bare-metal servers was that you also had to think about tracking the performance of the hypervisor in the case of VMs. You wanted to collect data such as how many virtual CPUs were idle, how the hypervisor was allocating memory to different VM instances (especially if you were using memory-ballooning features), and how much data storage was available to the VMs from the host. Metrics like these helped you ensure that your hypervisor was operating properly, which in turn helped manage the health and performance of the VMs hosted by the hypervisor.

VM Monitoring in a Cloud-Native World

All of the above remains important for VM monitoring in today’s cloud-native environment, at least in some cases. However, as the cloud-native age has taken hold, the approach your team takes to VM monitoring, and the types of metrics it tracks, have changed as the result of several factors.

Hypervisors Are Managed by Cloud Providers

In many cases today, you no longer run VMs on your own bare-metal servers. Instead, you deploy them through a public cloud service, such as Azure Virtual Machines or Amazon EC2.

On these services, the cloud provider manages the hypervisor for you. It’s no longer up to you to make sure the hypervisor is allocating memory or virtual CPUs efficiently, for example. Those jobs are handled by the cloud provider.

This means that hypervisor monitoring is comparatively less important in cloud-native environments. It still matters if you are deploying VMs on-premises, but that is rarer and rarer these days.

Cost Matters

Another consequence of the move to cloud-based VMs is that cost matters more than it used to.

Every engineer has always wanted to avoid having more VMs than needed. But when VMs ran on-premises, it wasn’t a big deal if you forgot to turn one off, or if you allocated more memory or vCPUs than it strictly needed.

These things do matter in the cloud. In the cloud, you pay for every second that a VM is running. You also pay more for VM instances that are provisioned with more resources.

In order to avoid cost overruns, then, it’s critical to ensure that your VMs are running when they need to, but shut off when they don’t need to. You also must ensure that they have enough capacity to handle their workloads with a comfortable buffer, but that they are not over-provisioned to the point of wasting money.

As a result, VM monitoring in the cloud is not just about tracking health and performance for their own sake, but also about monitoring and optimizing costs.

Diverse VM Platforms

For the first decade or so of the VM era (which is to say, approximately the years 2000-2010), the market was dominated by VMware, whose platform was arguably the only enterprise-grade VM solution that enjoyed widespread adoption. (Other production-quality solutions existed, but they targeted desktop virtualization more than server workloads.) That changed by around 2010, when new solutions like Microsoft Hyper-V and Linux’s KVM hypervisor (to say nothing of Xen) had matured.

From a monitoring perspective, the main consequence of this change was that it became more difficult to rely on a single vendor’s monitoring tooling for tracking the health and performance of VMs. Third-party tools that could work with a variety of platforms have become more important in the cloud-native age, when engineers continue to have a wide selection of VM platforms to choose from.

VMs Run in Clusters

In the days before cloud-native computing, each of your applications typically ran on a single VM. It was sweet and simple.

Today, your applications are typically composed of multiple microservices, with multiple instances running for each one. The instances are spread around a cluster of VMs, and they move constantly. This approach enables hyper-scalability, but it comes at the expense of sweetness and simplicity.

What this means for VM monitoring is that it has become crucial to track the health and performance not just of individual VMs, but of the entire cluster. You need to collect metrics such as how many VMs are up, what the load on each one is, and how long it takes a new VM to start.

In other words, what matters most in a cloud-native environment is not the health of individual VMs, but rather the health of the overall cluster, which is determined by the collective performance of all of the VMs. You still need to pull metrics from each individual VM to understand cluster health, but you also need to correlate all of that data to get cluster-wide visibility.


In short, although the core technology behind virtual machines has not changed much in two decades, the role that VMs play has evolved. Gone are the days of monolithic applications running on individual VMs. They have been replaced by cloud-based, scale-out clusters hosting a complex web of microservices, which require a new approach to VM monitoring. Watch this demo to learn more about what you can do with Splunk Infrastructure Monitoring as part of your Splunk Observability Suite.

What is Splunk?

This is a guest blog post from Chris Tozzi, Senior Editor of content and a DevOps Analyst at Fixate IO. Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure, and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO. This posting does not necessarily represent Splunk's position, strategies, or opinion.

Stephen Watts
Posted by

Stephen Watts

Stephen Watts works in growth marketing at Splunk. Stephen holds a degree in Philosophy from Auburn University and is an MSIS candidate at UC Denver. He contributes to a variety of publications including CIO.com, Search Engine Journal, ITSM.Tools, IT Chronicles, DZone, and CompTIA.