Love it or hate it, many organizations have Microsoft Windows as part of their infrastructure. They usually operate a series of Windows services like:
- Internet Information Services (IIS)
- Windows SQL Server
- Other .NET applications on top of Windows Server
Although surveys report that the market share of businesses using Windows is smaller than that of businesses using Linux, many organizations still use private Windows servers that are not accessible over the internet.
Therefore, organizations that choose to utilize Windows infrastructure components will have to set up proper observability and alerting agents to monitor their performance. Since the leading observability and monitoring tools are targeted primarily for Linux, it can be difficult to identify an effective and reliable way to monitor Windows infrastructure.
In this article, we will explain the best ways to monitor Windows infrastructure, including:
- The most popular tools and applications that help you collect important system metrics from Windows infrastructure components
- The top metrics to monitor and what to look for when analyzing them.
Monitoring, of course, is only one part of a successful infrastructure resiliency strategy.
How to monitor Windows infrastructure
The key to monitoring any OS infrastructure (whether Linux or Windows) is to utilize an instrumentation agent that works along the kernel without preventing it from working under load. You don’t want to insert a clunky piece of software that sabotages performance or creates memory leaks.
Before installing an open source tool to capture system information, you should explore the Windows server toolkit that came bundled with your purchase. For example, you can use the following tools to capture system metrics:
- PowerShell and WPI: PowerShell is the bread and butter of Windows and Linux task automation. It can be used as both a scripting language and a configuration management tool. You can also use it to query OS and system information, relevant performance counters and hardware information. With the Windows Management Instrumentation (WMI) extensions, you can query performance class counters using the WmiPerfInst provider.
- Typeperf: Typeperf is a command line tool that you can use to quickly write performance metrics to the console or a file. Because of its simplicity and flexibility, you can use this tool to capture performance data for useful services like IIS servers or SQL Server.
- AppCmd: AppCmd is the default tool for managing IIS 7+ servers. It offers a variety of commands that allow you to view information about the worker processes and requests that are running on the server.
Other notable tools
In addition, there are several other noteworthy tools that monitor and expose useful OS metrics and performance counters which can then be exported into an agent or a remote service:
Sysinternals Suite is a suite of tools designed to host advanced system utilities and technical information. It was written by Mark Russinovich, but because of its high quality and comprehensiveness, it is now offered by Microsoft as a separate download. These tools can dissect your Windows performance metrics and give you a detailed view into each one. Some of the most notable tools in this suite include:
- DiskMon, which monitors disk usage
- Process Explorer, which monitors processes
- Registry Usage, which reports the registry space usage
Psutil is like a Swiss Army Knife for retrieving system information and utilization counters. It’s written in Python, so it can be used for both Linux and Windows machines, and it takes full advantage of the language’s flexibility.
Metrics to monitor in Windows environments
Once you have identified the best ways to collect and monitor your Windows infrastructure metrics, you want to create monitors that display them — ideally in a single pane of glass.
(Understand the four golden signals of monitoring.)
CPU processor, memory & network counters
The most basic counters are the ones that map to actual hardware or available resources. At a minimum, you want to have a list of all available CPU cores, memory statistics and network bandwidth counters.
Windows events are detailed records about the system, security and application notifications that are stored by the Windows OS. These are useful for tracing reliability issues within infrastructure environments. When monitoring these events, you want to be able to filter them based on their severity and schema.
Detailed process information
You should collect information about:
- Process use
- Open handles
- Page errors due to inadequate memory
- The number of context switches on a per process level
This information is useful for troubleshooting concurrency issues with your apps.
In certain cases, it’s critical that you store data in disks and perform disk IO operations. You want to make sure that your storage is unobstructed and that you detect storage failures or disk fragmentation issues before they become problems. Useful metrics to monitor in this category include:
- I/O activity
- Read/write ops
- Idle times
These are Windows services that run as background processes with no direct user interface (otherwise known as daemons). These are critical, because if they fail, then most of the other external services will also fail. Be sure to monitor and check the status of these services as well as the corresponding event logs in case there is a failure.
Next steps with Windows monitoring
If you are operating Windows servers for cloud-native workloads, it’s critical that you set up observability agents as well, so that you can measure and examine the internals of the system proactively. Splunk Enterprise will help you take this to the next level by providing complete visibility into what’s happening in your business and utilizing advanced AI and machine learning models to provide intuitive visualizations.
What is Splunk?
This article was written by Theo Despoudis, a senior software engineer, consultant and experienced mentor. He has a keen interest in open source architectures, cloud computing, best practices and functional programming. He occasionally blogs on several publishing platforms and enjoys creating projects from inspiration. Follow him on Twitter @nerdokto.
This posting does not necessarily represent Splunk's position, strategies or opinion.