What Is IT Monitoring?
What are the basic types of IT monitoring?
The basic types of IT monitoring include availability monitoring, web performance monitoring, web application management and application performance management, API management, real user monitoring, security monitoring and business activity monitoring.
While there’s no completely canonical list of the various types of IT monitoring tools, many terms include multiple types of monitoring, blurring the lines that define this market. Here’s a look at the general types of tools that comprise IT monitoring:
- Availability monitoring: Often referred to as system monitoring, availability monitoring is arguably the most mature type of IT monitoring tool. Including categories such as server management, infrastructure monitoring/management and network monitoring/management, this is designed to provide users with information about uptime and the performance of whatever is being monitored.
- Web performance monitoring: A subset of availability monitoring, web performance monitoring is designed to monitor the availability of a web server or service, but also adds more fine-grained detail to the system. These tools can capture information such as page loading time, the location of errors that are generated, and individual load times of various web elements, helping analysts to fine-tune a website or a web-based app’s performance.
- Application management/application performance management (APM): APM tools are similar to web performance monitoring tools, but they’re designed with customer-facing applications in mind, allowing analysts to track the performance of an application and spot any issues before they become too severe for the user base. More modern APM tools can include automated routines to troubleshoot these issues without the intervention of a human developer.
- API monitoring: Enterprises that offer APIs to third-party developers will find it crucial to ensure the uptime of these services. API monitoring tools and monitoring software provide insight into whether an API is working properly, ensuring minimal downtime.
- Real user monitoring (RUM): Real user monitoring is designed to record actual end-user interactions with a website or application. By monitoring real-world load times and user behavior, it can pinpoint problems based on “real” user experience challenges, as opposed to simulations. This type of monitoring is designed to be backward-looking, not predictive, allowing analysts to spot problems only after they occurred.
- Security monitoring: Security monitoring is a highly specific type of IT monitoring, designed to observe a network for breaches or other unusual activity. Security monitoring is a broad, high-level category that includes numerous subsets of security analysis tools.
- Business activity monitoring (BAM): This type of monitoring tool takes key business performance metrics and tracks them over time. For example, these metrics could include information about retail sales, application downloads or the volume of financial transfers.
Note that all of these tools may be tasked with monitoring on-premise equipment or applications and may be used in conjunction with cloud-based systems, or both.
Security IT monitoring is used to observe threats and suspicious activity in the network.
What types of tools are used in IT monitoring?
IT infrastructure monitoring tools can be broken down into three general categories or types of network devices — observational, analysis and engagement — based on how they’re used:
- Observational tools: These are the most basic types of IT monitoring tools, used to observe hardware, software or services and report back on their operational effectiveness. Most availability monitoring tools, including infrastructure monitoring and management tools, application performance monitoring tools, and web performance monitoring tools fall into this category.
- Analysis tools: This type of IT monitoring tool is tasked with taking observational data and analyzing it further. This data may be analyzed to determine where problems are originating or more critically, to determine why those problems might be occurring. Advanced analysis tools, such as AIOps systems, are tasked with forecasting where problems are likely to arise based on historical trends and patterns.
- Engagement tools: As the final tier of IT monitoring tools, engagement tools are designed to act upon information created by both analysis and observational tools. This may take a simple form, in the case of service tickets or alerts that are intelligently delivered to the appropriate analyst or business manager, or more commonly, be used to spin up additional services, reboot troublesome hardware or software, or run backups.
How do IT monitoring and management work together?
IT monitoring tools provide the information upon which management can act. IT monitoring is a part of management, collecting and delivering performance information so it can be leveraged for tactical and business decision making.
Information delivered by IT monitoring tools let business managers delve more deeply into the impact of IT infrastructure on the top and bottom line. That 0.11% downtime translates to 11 minutes of unavailability per week. During prime business hours, 11 minutes in which the system is unable to process payments may have significant cost. How does this compare to the cost of replacing a memory card in the server or upgrading the network to avert that downtime? Or is there a process issue that should be addressed to resolve the problem? If downtime is increasing, a savvy manager may deduce that even greater trouble is on the horizon, and may use the IT monitoring data to make the case for replacing or upgrading existing hardware.
IT monitoring has an increasingly important role in the realm of DevOps, mainly because DevOps revolves around the concept of multiple-team collaboration, particularly development and operations. But more and more, enterprises have found even greater benefits when other departments are drawn into this mix, including security and QA/testing teams. Only when all of these groups work together as a cohesive team can a software or service product launch be successful.
IT monitoring is a natural complement to this concept, particularly relevant for products that rely on high availability, such as a cloud-based service or an app that relies on your company’s API. When these services slow down or crash altogether, customer satisfaction, and possibly revenue, can drop to zero. As such, it’s critical for DevOps teams to work to ensure that critical systems remain operational and responsive, and to build these measurements of performance directly into the development process from the start.
Another place DevOps and IT monitoring overlap is with regard to the increasing pace of product updates, as applications sometimes are updated several times a day. Monitoring is essential in these types of environments, as the breakneck pace of development often provides minimal time for quality assurance before a new update goes live. In some cases, an undiscovered bug makes it into production, causing a key system to experience an unacceptable slow down or crash. With a solid, real-time IT monitoring solution in place, these errors can be detected quickly, often within seconds, allowing the DevOps team to remedy the problem immediately, or roll back the code to a known working state, minimizing downtime.
That said, in the world of DevOps, IT monitoring is also forward-looking. DevOps monitoring systems can be tasked to monitor the very tools that developers use in their own work, helping managers spot areas that are inefficient or that could benefit from automation.
IT monitoring is an increasingly important component of DevOps because it requires multiple-team collaboration.
How do IT monitoring and automation work together?
IT monitoring comes to bear on automation primarily with engagement tools. As noted, automation can take the form of automated service tickets or alerts, or they can perform a complex series of actions that remedy a problem that’s been detected by the monitoring tool without human intervention.
The more complex the infrastructure, the more necessary automation becomes. In enterprises of even modest size, there are simply too many moving parts for humans to manage, which becomes even more complicated with hybrid systems that combine both cloud and on-premises networks.
IT monitoring tools that incorporate automation are designed to simplify all of this. If a server is slowing down in response to a sudden burst of customer activity, the tool may diagnose the problem as an overloaded CPU, and could automatically instruct another server (real or virtual) to take over. When network traffic decreases, it may then decide to spin down that second server. The tool also has the ability to issue a root cause report about the incident so that management can decide whether an upgrade is in order.
IT monitoring tools are used in a wide variety of ways by analysts, and there’s no canonical guidance for exactly how they should be utilized. That said, in broad terms, analysts use IT monitoring tools to execute a plethora of critical functions, such as:
- Monitoring and troubleshooting physical and virtual infrastructure nodes, including servers, network hardware and cloud-based systems, allowing issues to be quickly resolved.
- Monitoring applications running in real-time to ensure uptime and speed development in a DevOps environment.
- Improving the IT decision-making process by making it easier to identify bottlenecks, bandwidth hogs, and other potential trouble spots in the network environment.
- Upgrading visibility into cloud-based systems and integrating monitoring with on-premises systems.
- Predicting and analyzing the impact of IT operations on the business, including financial impact.
- Automating incident management to reduce the need for human oversight, quickly repair problems, and avoid alert fatigue.
- Tracking end-user behaviors within an application to identify opportunities for improvement.
How do you choose an IT monitoring strategy?
If you’re ready to launch your own IT monitoring strategy, here’s a step-by-step guide to getting started.
- Determine your goals. Do you merely want to be alerted if a single server goes down, or do you need to keep tabs on a hybrid environment that involves on-premises hardware and cloud services? Do you want to integrate your monitoring tool with other services? Do you want visibility into specific performance data? Do you want to use machine learning technology to automate corrective actions? The answers to these questions will greatly impact the complexity of monitoring tools you should consider.
- Bring business leaders on board. In conjunction with step 1, you’ll want to involve stakeholders outside the IT organization to get buy-in on their IT monitoring goals as well. Consolidate these needs with IT’s monitoring needs to create a single list of goals.
- Identify key features you need. Most monitoring tools offer basic features like reporting and dashboards, but they vary in sophistication. If you have a special need for data retention, or want real-time, machine learning-driven insights, these types of features will also point the way to their own particular solutions.
- Identify data sources that can be used. These data sources can range from servlogs to machine data to third-party data sources. Whatever you’re trying to monitor, there should be at least one relevant data source that relates to it. Enumerate all of these sources so you can ensure that any tool you consider supports the desired information.
- Evaluate tools on a trial basis. Armed with all of this, you needn’t jump in whole hog with the first IT monitoring tool that sounds like a good fit. Most of these tools are available on a trial basis, so you can see how well they will work in your environment before you pull the trigger. This is particularly true for tools that are offered as a service, on a subscription basis.
What are the best practices for IT monitoring?
How you use the tool is just as important as which tool you choose, and some solid best practices to consider include being savvy with alerts, considering alert level and medium, refining dashboards, creating an escalation plan, embracing redundancy and watching for outliers.
- Be savvy with alerts. Too many alerts will quickly lead to fatigue and, even worse, ignored alerts. Take care to craft alert logic that is tripped when humans really need to get involved.
- Consider levels of alerts. Basic crashes or limited downtime can be routed to low-level analysts, but more serious problems need to be escalated to managers, and quickly. Assign problems along various severity levels to make this type of categorization and escalation easier.
- Also consider the medium. When is an emailed alert acceptable, and when does a text message or other mobile notification need to be used? Remember that too many texts can quickly lead to alert fatigue and missed alerts.
- Refine your dashboards. The dashboard is where most analysts will spend the bulk of their work day, so it makes sense to expend effort to ensure the dashboard has the most critical information front and center, and secondary information within easy reach.
- Create an escalation plan separate from the alerts system. Your alerts may be designed with rudimentary escalation routines, but a seemingly simple problem with a server can quickly escalate into a major one. For example, your IT monitoring tool may only report that an offsite server is offline, not knowing that a Category 5 hurricane is bearing down on the data center. These are vastly different levels of problems that merit much different responses.
- Remember that redundancy is good. When possible, avoid relying on a single source of data to monitor the health of a particular node. If your monitoring tool loses access to a server log, does that mean the server is down, or that a network cable has been cut? You won’t know unless you have a secondary data source that can monitor network traffic, which can help to more quickly troubleshoot these kinds of issues.
- Watch for outliers. An average web page response time of 0.3 seconds is great, as long as that doesn’t mean that a small percentage of your users are actually seeing response times of 30 seconds or more and slipping through the cracks. A smart monitoring strategy needs to look at all the data, not just median information, and troubleshooting often needs to address the unique set of variables that might be causing trouble for a small portion of the end-user base.
Bottom line: IT monitoring can make or break your business
IT monitoring is not just about telling a technician when a server crashes, it’s also about intelligently predicting these problems in advance and, increasingly, automating a response to remedy these problems before users are actually impacted.
As IT infrastructures have become increasingly complex, it has become essential for IT managers to implement systems that allow them to keep pace. By formally integrating IT monitoring into your entire ecosystem, you can dramatically improve operations along a wide variety of metrics ranging from simple service availability to ensuring high performance and overall profitability of the business.
- 4 Steps to Transforming Your IT Monitoring Strategy
- Can I Even Do Predictive Maintenance?
- Monitoring ICS with Splunk: SCADA, Historians, and Alarms, Oh My!
- Compete and Save With Predictive Maintenance
- Applying Machine Learning to Maintenance Operations
- Getting DevOps Insights from the Splunk Essentials for Application Analytics App