IT Operations Analytics: An Introduction

Information Technology Operations Analytics (ITOA) is an analytics technology that uses datasets generated by IT systems to improve their efficiency and effectiveness as part of the practice known as IT operations management (ITOM). The primary goal of ITOA is to make IT operations more effective, efficient, faster and more proactive through the use of an organization’s own machine data.

By analyzing the historical, real-time machine data produced by all elements of an organization’s IT infrastructure, IT teams can not only make sure their systems are running at peak performance, but can also predict and prevent outages by observing previous events. In other words, ITOA is the application of big data to solving the many challenges faced by the IT department while helping create better decision-making processes.

In many ways, ITOA is an umbrella term that captures a large number of IT operations, including reporting, querying and data analytics. In addition to providing operations analytics, many ITOA solutions are designed to bundle applications programming management (APM) capabilities as well as configuration management tools to support a broad range of business, compliance and resource allocation requirements.

In this article, we’ll discuss how ITOA fits into modern IT operations, how it compares to observability, AIOPs and capacity management, its benefits and challenges and best practices in implementing ITOA in your organization.

IT Operations Analytics (ITOA) Basics

IT operations are all the activities executed by the IT department in a company or organization that are designed to maintain and optimize the performance of the technology. This includes everything from the individual workstations used by members of the organization to the overall network infrastructure on which the organization’s systems run. As the complexity of network infrastructure has grown over the years, including the increasing use of microservices and cloud computing, the role, complexity and importance of IT operations has also increased.

Some specific examples of IT operations in the enterprise include:

  • Incident management: Resolving an IT-related incident as quickly as possible and minimizing its effect on business processes.
  • Problem management: Analyzing the cause of IT incidents and preventing them from reoccurring.
  • Access management: Determining who can access an organization’s IT network, including usernames and passwords, creating groups and setting different permissions based on role and need.
  • IT operations control: Managing day-to-day operations of the IT network, including monitoring and controlling IT services and the infrastructure on which they run.
  • Facilities management: Overseeing the physical plant of the IT network, from buildings to server rooms to the actual infrastructure components.
  • Technical management: Planning, deploying, maintaining, staffing and leading an IT organization.

ITOA is a key element of modern IT operations, and is the logical extension of the data revolution into the practice of organizational IT. Before ITOA, IT operations, other than scheduled maintenance, were almost completely reactive — fixing something when it stopped working. As IT systems became more complex and downtime became increasingly expensive in terms of organizational reputation and downtime penalties, a proactive approach became necessary. By using historical machine data and operational data to predict likely outages and prevent them from happening, ITOA gives IT operations teams an invaluable tool to:

  • Help them maintain performance and uptime.
  • Prevent and resolve outages.
  • Make planning and deployment decisions based on business need and historical usage data.
  • Support the overall efficient operation of the IT organization.

Using Root Cause Analysis with ITOA

Root cause analysis in IT operations is the practice of using all available data and information pertaining to an issue, event or outage to determine what the core cause of the problem. Before the advent of sophisticated data analysis capabilities, root cause analysis often required a trial-and-error approach, in which each potential source of failure was isolated and investigated. These types of approaches were labor intensive, time consuming and expensive.

Thanks to ITOA’s machine-learning driven analytics capabilities, root cause analysis is now quicker and more effective, attributed to using the system’s own machine data to correlate the event in question with the historical data from similar events. By using machine learning and system data, ITOA tools are able to find the root cause of an issue significantly faster.

Root cause analysis looks at logs and diagnostic data from applications, tracks changes in code, monitors capacity and usage and can be configured by users to monitor for specific key performance indicators (KPIs) they wish to track.

ITOA vs. AIOps, Observability and Capacity Managemens?

AIOps is the practice of applying analytics, business intelligence and machine learning to big data, including real-time data, to automate and improve IT operations and streamline workflows. AI can automatically analyze massive amounts of network and machine data to find patterns, both to identify the cause of existing problems and to predict and prevent future ones.

The term AIOps was coined by Gartner in 2016. In the Market Guide for AIOps Platforms, Gartner describes AIOps platforms as “software systems that combine big data and artificial intelligence (AI) or machine learning functionality to enhance and partially replace a broad range of IT operations processes and tasks, including availability and performance monitoring, event correlation and analysis, IT service management and automation.”

While ITOA uses data to analyze events, it generally focuses on monitoring collected data to analyze events that occurred in the past. AIOps platforms use artificial intelligence (in the form of machine learning) to not only analyze issues and events that have already occurred, but also to predict future events and prevent them from happening. In that regard, AIOps is generally considered to make more significant and practical use of artificial intelligence than basic ITOA functionality. However, many people consider AIOps to be a further evolution of ITOA, and while you might hear the terms used synonymously, they are, in fact, not interchangeable.  

ITOA vs Observability

In the same way that ITOA can contain elements of AIOps, the overall function of ITOA and ITOM are increasingly coming under the umbrella of observability.

Observability is the ability to measure the internal states of a system by examining its outputs. A system is considered “observable” if the current state can be estimated by only using information from outputs, namely sensor data. The term originated decades ago with control theory (which is about describing and understanding self-regulating systems). However, it has increasingly been applied to improving the performance of distributed systems. Three types of telemetry data — metrics, logs and traces —allow us to be observable, providing deep visibility into distributed systems and allow teams to get to the root cause of a multitude of issues and improve the system’s performance.

Observability allows teams to monitor modern systems more effectively and helps them to find and connect effects in a complex chain and trace them back to their cause. Further, it gives system administrators, IT operations analysts and developers visibility into their entire architecture.

Observability and ITOA have the same fundamental goals: using data generated by IT systems to improve their efficiency and effectiveness. In common usage, observability defines a philosophy of action and ITOA defines a day-to-day role and practice within the IT organization. The distinctions between the two terms are not clearly defined and continue to evolve. The principles and practices of observability therefore support the ITOA function, but observability is not a replacement for ITOA, nor is ITOA an alternative to observability. One perceived difference could be related to the persona of the user of ITOA or observability tools. It could also be argued that observability is the latest iteration and, in fact, the evolution of the practice known as ITOA.

ITOA vs capacity management

IT capacity management is the practice of ensuring that an organization’s IT systems and infrastructure are sufficient to the tasks required of them. IT capacity management broadly incorporates three main elements: business capacity management, service capacity management and component capacity management. Capacity management is also used to forecast future needs and justify pricing and expenditure on additional IT equipment, services and personnel to meet them in an effort to make smarter and most cost-effective business decisions.

Capacity management is not a specific function of typical standalone ITOA tools, but many vendors are moving toward more integrated ITOA platforms that combine related functionality that includes capacity management.

Benefits of IT Operations Analytics

ITOA provides IT teams the ability to perform big data capture, indexing, management and search, all of which lend themselves to practical applications in an IT environment.

By using machine learning capabilities to collect and analyze large amounts of data, ITOA can enhance and accelerate IT log management, log search and analysis, and root cause analysis. It can also make performance predictions based on past performance data.

By performing the above-mentioned tasks automatically without requiring the involvement of the IT team, ITOA automates a wide variety of functions and leads to a number of benefits, including:

  • Avoidance of service interruptions, slowdowns and outages.
  • Faster root cause analysis and problem recovery times.>
  • Enhanced system and application performance.
  • Improved end-user experience.
  • Increased operational efficiency.
  • Improved computer resource utilization.

A graphic displays the six benefits of ITOA mentioned previously, such as avoiding service interruptions and improving computer resource utilization.

ITOA makes IT operations more effective, efficient, faster and more proactive through the use of an organization’s own machine data.

What are the challenges of implementing ITOA?

ITOA can be challenging to an organization that is used to manual processes and has not attempted to automate core IT functions. Knowing where to begin can create challenges.The term ITOA and related terms are used and combined in a wide variety of ways by end users, analysts and vendors. It can be difficult for an organization to identify what they need to prepare (and what they need to purchase) to implement an ITOA platform.

  • Understand your IT environment: In order for an ITOA solution to be effective, it needs to take into account an organization’s complete IT infrastructure, which may have changed significantly over time. Mapping out and understanding the components and functions is essential to effective ITOA implementation.
  • Plan based on business needs: An ITOA solution can provide a significant number of benefits across the organization, but it is essential to plan for the outcomes you want to achieve. What are the primary business problems that need to be addressed with a new ITOA solution or an upgrade to an existing ITOA solution? Going into the process without an understanding of priorities can lead to unsatisfactory outcomes.
  • Quantify the business impact: ITOA is an essential function, but it is still valuable to have an understanding of the business value it can provide in order to justify the expenditure. What are the tangible costs to the organization of outages and issues, for example? What are the intangible costs in terms of customer satisfaction and reputation? The more completely the business impact can be estimated, the more effectively the value can be calculated.
  • Ensure successful integration with legacy systems: When planning an ITOA implementation or upgrade, it is important to consider your installed base of hardware and software to ensure interoperability. With multi-vendor legacy systems, this can be a complex and difficult task. Many organizations use consultants, service providers or vendors to help them understand the integration issues before making decisions.

Getting Started With ITOA Tools

IT operation analytics solutions and tools are generally sold as complete packages on an operational analytics platform by different vendors. The components within an ITOA framework perform a variety of functions, including:

  • Data capture, storage, indexing and transformation.
  • Business analytics including automated baselines, pattern discovery, anomaly detection, statistics and recommendations.
  • Search and visualization capabilities.

Implementing ITOA is similar to selecting any type of major application in an organization and starts with your organization’s established request for proposal (RFP) process. A few best practices for implementing these new processes include:

  1. Communicating the intent to implement ITOA with all relevant stakeholders.
  2. Holding information-gathering sessions with stakeholders to discuss their needs and wants, plus any concerns that arise.
  3. Documenting the discussions and making the findings available to all stakeholders.
  4. Creating and communicating an implementation strategy.
  5. Designating a team to lead the process and develop and issue an RFP.
  6. Evaluating vendor responses.
  7. Selecting and implementing the best available vendor.

A dotted jagged line travels horizontally across seven points, each point a picture of the seven steps to get started with ITOA.

Implementing ITOA follows a similar process to selecting any type of major application.

The Future of ITOA

It’s possible that the future of ITOA is already here, in the form of AIOps. Others would say that the future of ITOA is an evolving component of observability. Currently there isn’t yet a clear distinction among the terms ITOA, AIOps and observability, and they may be used interchangeably or in combination to describe a particular use case, software or hardware implementation. Regardless, both AIOps and observability represent the increased reliance on data, machine learning and artificial intelligence to perform IT analytics and maintain optimum efficiency of IT systems. ITOA and its related disciplines can only grow as the platforms evolve to make better use of machine learning and artificial intelligence. The more data an ITOA platform has available to it, and the more AI capabilities it incorporates, the better it will be able to predict future IT events, prevent issues and outages from occurring and create a better customer experience.

The predictive nature of AI as applied to ITOA also provides the opportunity to use ITOA as a predictive analytics planning tool, predicting the potential business impact of the IT functionalities that it monitors. Teams who are responsible for planning IT infrastructure advancements can use ITOA for capacity planning, to understand the ramifications, both positive and negative, of future growth. The future of ITOA lies squarely in the ability of ITOA vendors to take advantage of AI and data capabilities to provide additional predictive functionality.

The Bottom Line: ITOA is evolving, and essential

ITOA is a discipline and methodology that brings data and analytics to the process of managing an organization’s IT infrastructure. There is no doubt that it is the future of ITOM. Any organization that wants to get the most value from its data, use it to maximize its IT investment and turn the combination into a distinct business and competitive advantage needs to investigate and implement an ITOAv plan.

What is Splunk?

This posting does not necessarily represent Splunk's position, strategies or opinion.

Stephen Watts
Posted by

Stephen Watts

Stephen Watts works in growth marketing at Splunk. Stephen holds a degree in Philosophy from Auburn University and is an MSIS candidate at UC Denver. He contributes to a variety of publications including CIO.com, Search Engine Journal, ITSM.Tools, IT Chronicles, DZone, and CompTIA.