Tips & Tricks

September 14, 2014

3 Minute Read

Identifying Zombie, Chatty and Orphan VMs using Splunk App for VMware

By Splunk

Virtualization is difficult to manage given the complex moving parts from storage to networking to hardware. When you have a dynamic VMware environment with Distributed Resource Scheduler (DRS) and High Availability (HA) enabled, Virtual Machine’s (VM) in the environment can transition through multiple hosts and clusters and can potentially become unregistered VM’s. This can lead a VMWare Administrator to loose visibility for these VMs. In addition each VM in a datacenter could cost from a couple hundred dollars into the thousands (http://roitco.vmware.com) based on your environment and infrastructure costs.

In this blog post I will cover three types of VM’s that can exist in your VMware Infrastructure and requires additional attention. The definition of these VM’s vary, but I’m sure you will be able to recognize them regardless of the name I give them.

Zombie VM : Virtual Machine that uses less than certain amount of CPU for a period of time. (Example: VM using less than 5% CPU for over a thirty-day period.) Since Zombie VM’s are the VMs running very low CPU usage, it could be repurposed to run other applications when needed.

Chatty VM (Opposite of Zombie) : Virtual Machine that uses more than certain amount of CPU for a period of time. (Example : VM using more than 80% CPU over a week). Chatty VM’s are the ones probably moving from ESXi to ESXi host using vMotion based on utilization.

Orphan VM : There are multiple definitions for this type of VM. Here are a just some examples of what an Orphaned VM can look like:

Virtual Machine that was unregistered from vCenter Server but still running within the environment unmanaged.
Virtual Machine that exists in the vCenter database but is no longer present on the ESXi host.
Virtual Machine that exists on a different ESXi host than expected by the vCenter Server.

In many occasions, actively running Orphan VMs is a security concern since they are not visible to vCenter Server and thus the VMware administrator is unaware of them as well. The VM’s will not be patched and can go undetected from compliance and operational audits.

Orphan VM’s happen because of some of the following reasons:

After a vMotion or VMware DRS migration event.
After a VMware HA host failure occurs, or after the ESX host comes out of maintenance mode.
A virtual machine is deleted outside of vCenter Server.
vCenter Server is restarted while a migration is in progress
Too many virtual machines are scheduled to be relocated at the same time.
Attempting to delete virtual machines when an ESX/ESXi host local disk (particularly the root partition) has become full.
Rebooting the host within 1 hour of moving or powering on virtual machines.
A .vmx file contains special characters or incomplete line item entries.

In order to gather information from a complex environment like VMware, we will need to collect performance, log and configuration data from vCenter Server and ESXi hosts.

Splunk App for VMware provides deep operational visibility into granular performance metrics, logs, tasks and events and topology from hosts, virtual machines and virtual centers.

Splunk App for VMware provides:

Proactive monitoring of your virtual infrastructure.
A visual interactive topology map of your virtual environment, highlighting problems and statistical comparisons based on predefined customizable thresholds.
Views that provide insight into how you environment performs with details on performance, availability, security, and capacity and change tracking.
Capacity Planning and Capacity Forecasting dashboards.
Correlation of VMware virtualization data with NetApp NFS datastores.
Views that show the operational health of your environment, identifying underperforming or distressed hosts, virtual machines, and datastores.
A security view that provides visibility into potential security breaches and non-compliant usage patterns.
The collection of granular performance metrics and log data all in one place, directly collected from VMware vCenter Servers and ESXi and vCenter logs (collected via syslog).
The ability to explore very large data volumes, at speed, with access to fast queries on performance data.
Track changes with visibility into VMware vCenter Server tasks and events in the context of your virtual environment.

Going back to basics of core Splunk, we can create our own searches, reports, alerts and dashboards on top of any Splunk app. With these additional dashboards we can identify, validate and repurpose these VMs that was mentioned above.

Lets go ahead and identify Zombie, Chatty and Orphan VMs by custom search command.

(sourcetype=vmware:perf:cpu source=VMPerf:VirtualMachine) OR (sourcetype=vmware:inv:vm changeSet.name=*) | eval detect = if(p_average_cpu_usage_percent < 5.00, zombie, if(p_average_cpu_usage_percent > 80.00, chatty, normal)) | stats first(detect) as CPU Status by moid

We can put together a very cool dashboard to show all the Zombie, Orphan and Chatty VMs.

Since the zombie and/or orphan VM’s could be repurposed for other usage, we can calculate the total cost for removing or repurposing the troubled VM’s.

This could help you show your management how much you saved the business with real savings!

(sourcetype=vmware:perf:cpu source=VMPerf:VirtualMachine) OR (sourcetype=vmware:inv:vm changeSet.name=*) | stats first(detect) as CPU Status first(changeSet.name) as VM Name first(p_average_cpu_usage_percent) as Avg CPU Usage by moid | stats count(moid) as moid, count(VM Name) as vms | eval cost = (moid vms)*$price$ | table cost

Splunk can help your organization repurpose zombie and orphan VM’s to fully utilize your virtualization effort and to keep it secure. Splunk can also help identify chatty VM’s and move them to properly sized ESXi hosts.

Happy Splunking.

This blog post was jointly written by Tolga Tohumcu and Kam Amir…

----------------------------------------------------
Thanks!
Tolga Tohumcu

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

Tips & Tricks 9 Min Read

Splunk > Clara-fication: Dashboarding Best Practices

So you want to build a better dashboard, do you? Well good, you’ve come to the right place! Learn the do's and don'ts of optimizing dashboard naming standards, layouts and more.

Tips & Tricks 5 Min Read

Quantitative Finance with Splunk: 'Who Correlated My Asset'

Using Splunk to collect and analyze financial and market data

Tips & Tricks 1 Min Read

Jira users’ group Thursday September 18

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Identifying Zombie, Chatty and Orphan VMs using Splunk App for VMware

Related Articles

Splunk > Clara-fication: Dashboarding Best Practices

Quantitative Finance with Splunk: 'Who Correlated My Asset'

Jira users’ group Thursday September 18

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram