Server monitoring is the process of gaining visibility into the activity on your servers — whether physical or virtual. Servers are devices (or increasingly, applications) that store and process information that is provided to other devices, applications or users on-demand. A single server can support hundreds or even thousands of requests simultaneously. As such, ensuring that all of an organization’s servers are operating according to expectations is a critical part of managing your IT infrastructure.
Did you know? Splunk makes server monitoring easy!
Splunk can monitor the performance of all your servers, containers, and apps in real-time with Splunk Infrastructure Monitoring.
- Learn about Splunk Infrastructure Monitoring
- See all Splunk Infrastructure Monitoring features
- Take a free Guided Tour of Splunk Infrastructure Monitoring
The term “server monitoring” is complex because of the exceptionally wide range of servers that exist. A web server can be a physical device, but it increasingly refers to a virtual server housed on a physical machine shared by dozens of other clients, each running their own independent web server system. Mail servers, print servers and database servers are just a few types of server devices and software.
Monitoring and alerting to issues on these various servers each requires a specific type of technological oversight, and the typical “off the shelf” server monitoring tool is unlikely to be appropriate for every one of them. In this article, we’ll help explain how various server monitoring tools and monitoring services work, the value they bring to the enterprise, and how to go about selecting the right system for your organization.
The Important Role of Server Monitoring
Servers are some of the most critical pieces of your IT infrastructure, so it stands to reason that monitoring their performance and uptime is vital to the health of your IT environment. If a web server is offline, running slowly, experiencing outages or other performance issues, you may lose customers who decide to visit elsewhere. If an internal file server is generating errors, key business data such as accounting files or customer records could be corrupted.
Server monitoring is designed to observe your systems and provide a number of key metrics to IT management about their operation. In general, a server monitor tests for accessibility (ensuring the server is alive and reachable) and measures response time (testing that it is fast enough to keep users happy), while alerting for errors (missing or corrupt files, security violations, and other problems). Server monitoring is also predictive: Is the disk going to reach capacity soon? Is memory or CPU utilization about to be throttled? Server monitoring is most often used for processing data in real time, but it also has value when evaluating historical data. By looking at previous weeks or months, an analyst can determine if a server’s performance is degrading over time — and may even be able to predict when a complete crash is likely to occur.
The Basics of Server Management
Server management is the ongoing process of operating a server in order to ensure uptime and reliability, high performance, and error-free operation. It represents the day-to-day activities required to administer and keep a server running, with a key focus on ensuring uninterrupted availability required for optimal user experience.
Server management can comprise a wide range of specific functions, depending on the organization, its IT structure, and the types and number of servers it operates. At a typical organization, server management includes daily monitoring, installing software updates, installation and setup of new equipment, and problem troubleshooting and triage. Server management also typically includes provisioning and capacity planning to ensure there are enough system resources to meet the organization’s needs. For example, if a firm may need enough web server power to support 10,000 simultaneous users, with a burst of up to 12,000 users, a server manager would ensure this capacity was available on demand.
Server management presents its own set of challenges in a virtual environment, as an IT manager can’t physically walk to the server hardware and check if there are any problems. A different set of challenges arise, however, if the servers are physical hardware devices. Servers in both environments need to be managed from a software and hardware perspective, as long as there is space, electrical power, network bandwidth and even cooling capacity to handle all of them.
Server Management Systems
A server management system is a software tool that allows an IT professional to administer a server — or, more typically, multiple servers. A server management system will typically collect operational data — CPU usage, memory, disk space and other disk utilization metrics, log files, OS monitoring statistics, and user access/security information — and display it in real time on a management dashboard. The system is also capable of collecting historical data, allowing IT managers to monitor these metrics over time.
In virtual environments, a server management system should not be confused with a hypervisor (also known as a virtual machine monitor.) While a hypervisor is a system that creates and operates virtual machines (or virtual servers), its function is to keep multiple virtual machines running according to the operator’s specifications — not necessarily to monitor their performance profile.
Server Monitoring vs Server Performance Monitoring
While server monitoring is a broad term that concerns the overall health of a server, server performance monitoring is concerned strictly with performance metrics. For a physical server, metrics primarily include memory and CPU utilization, as well as disk I/O and network performance. For a virtual server, performance metrics may include database or web server response time, network bandwidth utilization, and other measures of resource utilization, depending on the specific type of server.
Service performance monitoring is important for a variety of reasons. First, it is often predictive in nature — slowdowns and other performance issues can be instructive in helping IT pinpoint problems that are developing. Bottlenecks can be useful in showing where component or service upgrades are needed, and capacity management tools can be used to project what resources may be needed to support a new application or other workloads.
Compliance is another big issue that informs server performance monitoring. Many enterprises are committed to providing a certain level of uptime or performance, which can be critical in high-stress environments such as financial trading, SaaS offerings, and streaming media. If performance falls below certain thresholds, compliance penalties can be severe.
Types of Server Monitoring Systems
Server monitoring systems come in three basic varieties: on-premises/traditional software-based systems, cloud-based/SaaS systems and mobile systems. Additionally, a few hybrid systems combine both on-premises and cloud technologies into a unique, custom solution. Here are the pros and cons of each approach.
On-premises/traditional software-based systems are built around software that is installed on your own, in-house hardware. This is a traditional software model that is generally priced with a large up-front fee and a maintenance plan that enables ongoing support from the vendor. Because every installation environment differs, on-premises software installations can be complex, time-consuming and prone to difficulties. However, on-premises software can offer more customization options and may allow more control over where data is stored, which can be useful when reporting to regulatory agencies. In general, on-premises software is also more expensive than cloud-based options.
Cloud/SaaS systems are monitoring systems that are installed and managed entirely via the web. Because no software needs to be installed directly within the user’s infrastructure, systems can be rapidly launched and installed, sometimes in a matter of hours. While cloud services provide ample flexibility, they can often offer less direct control over customization and personalization. Cloud-based monitoring software is sold as a subscription, and many cloud monitoring providers do not require long-term contracts, facilitating easier entry and creating less risk than on-premises solutions.
Mobile systems are not a primary type of server monitoring system, but many on-premises and cloud providers also support a mobile implementation of their systems as an option. As the name implies, these systems run on a smartphone or tablet and provide on-the-go access to server monitoring data. Sometimes mobile functionality is limited in comparison to what can be performed via a traditional PC. Most cloud-based systems and a few on-premises systems offer a mobile monitoring option.
Best Practices for Server Monitoring
While every environment is different, key best practices can help to ensure your IT department gets the most out of their investment in a server monitoring solution.
- Ensure hardware is operating according to appropriate tolerance levels: File servers are often pushed to their operational limits, and very few ever get a break, running 24/7 with no room for any downtime. Pay careful attention to key metrics like CPU temperature, CPU and RAM utilization, and storage capacity utilization to ensure every server is always running at peak physical performance. These checks, called “heartbeat” checks, should be configured at regular intervals.
- Proactively monitor software for failures: Use your server monitoring tools to watch for software problems as well as hardware issues. For example, server monitoring tools can help alert you to errors that arise if a database has become corrupted, if a security event has disabled key services, or if a backup has failed.
- Consider your history: Server problems rarely emerge in a vacuum. Consider the historical context of any problems that arise by charting metrics over time — generally 30 days or 90 days. For example, has CPU temperature abruptly risen in the last few days? This could indicate a server fan is failing.
- Keep tabs on alerts: Alerts should be monitored in real time as they arise, then triaged and assigned to an analyst for a resolution. This is the most common way in which an analyst can determine that something has gone wrong. Find a reliable way to manage and prioritize the most critical alerts through the noise. When incidents are escalated, make sure it gets to the right person at the right time to ensure better team collaboration.
- Use server monitor data to plan short-term cloud capacity: In a virtual server scenario, your server monitoring system can be instrumental in helping to plan how much computing power you need at any given moment. If services begin to slow down for users or experience other performance issues, IT management can use the server monitor to assess the situation and quickly spin up additional resources — or take them offline, if demand is low.
- Get a jump on capacity planning: Datacenter workloads have roughly doubled over the last five years, and servers have had to keep up. By monitoring long-term trends in server utilization, you can be better prepared for future server needs (both online and off).
- Expand asset management and tracking: Server monitoring can give you insight into when systems are approaching end of life — or tell you if assets have vanished from the network altogether (often indicating either failure or theft). Instead of relying on spreadsheets to track physical hardware in the enterprise, let your server monitoring tool do the work for you.
What To Look For in Server Monitoring Tools
When considering a server monitoring tool, you’ll want to assess these key server monitoring capabilities:
- Breadth of coverage: Does the tool support all the server types (hardware and software; on-premises and cloud) that your enterprise uses? Is it prepared for future types of servers your enterprise may implement down the road?
- Intelligent alert management: Is it easy to set up alerts via the configuration of thresholds that trigger them? How are alerts delivered? Are mobile users a consideration?
- Root cause investigation intelligence: Does the tool include logic or AI algorithms to help you determine why a problem has occurred, rather than telling you that something has gone wrong without context?
- Ease of use: Does the system include an intuitive dashboard that makes it easy to monitor events, perform triage, and react to problems quickly?
- Support policy: How easy is it to get in touch with technical support if you need help?
Bottom Line: Server monitoring is a key function of any IT operation
Because servers are the technical life blood of any enterprise, it makes logical sense that IT managers would want to take every step possible to ensure that they are performing at their maximum potential. A smart server monitoring and management system is key to making that a possibility. But remember, the best server monitoring tools aren’t just reactive, informing you about problems only after they have emerged. They’re also proactive, giving you a heads up about potential problems before they become catastrophes, and putting you ahead of the game when it comes to creating a solution.
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.