Data Insider

What Is Artificial Intelligence for IT Operations (AIOps)?

AIOps is the practice of applying analytics and machine learning to big data to automate and improve IT operations. AI can automatically analyze massive amounts of network and machine data to find patterns, both to identify the cause of existing problems and to predict and prevent future ones.

The term AIOps was coined by Gartner in 2016. In the Market Guide for AIOps Platforms, Gartner describes AIOps platforms as “software systems that combine big data and artificial intelligence (AI) or machine learning functionality to enhance and partially replace a broad range of IT operations processes and tasks, including availability and performance monitoring, event correlation and analysis, IT service management and automation.”

AIOps Basics

How do you use artificial intelligence in operations management?

AIOps is designed to bring the speed and accuracy of AI to IT operations. IT operations management has become increasingly challenging as networks have become larger and more complex. Traditional operations management tools and practices struggle to keep up with the ever-growing volumes of data from many sources within complex and varied network environments. To combat these challenges, AIOps:

  • Brings together data from multiple sources: Conventional approaches, tools and solutions weren’t designed in anticipation of the volume, variety and velocity generated by today’s complex and connected IT environments. Instead, they consolidate and aggregate data and roll them up into averages, compromising data fidelity. A fundamental tenet of an AIOps platform is its ability to capture large data sets of any type, from across the environment, while maintaining data fidelity for comprehensive analysis.
  • Simplifies data analysis: One of the big differentiators for AIOps platforms is their ability to collect all formats of data in varying velocity and volume. The platform then applies automated analysis on that data to predict and prevent future issues and identify the cause of existing issues.
aiops-vision

Using machine learning and big data, an AI platform helps IT deliver greater business value.

What is an AIOps platform?

According to Gartner, an AIOps platform combines big data and machine learning to support IT operations through the scalable ingestion and analysis of data generated. The platform enables the concurrent use of multiple data sources, data collection methods, and analytical and presentation technologies.

In their 2018 Market Guide for AIOps Platforms, Gartner notes that, “AIOps platforms add important capabilities beyond what a monitoring tool with embedded AIOps can provide.” A true AIOps platform is “able to combine big data and machine learning functionality to support all primary IT operations functions through the scalable ingestion and analysis of the ever-increasing volume, variety and velocity of data generated by IT.”

An AIOps platform needs to be able to both analyze stored data and provide real-time analytics at the point of ingestion.

The central functions of an AIOps platform, as defined by Gartner, are:

  • Ingesting data from multiple sources agnostic to source or vendor.
  • Performing real-time analysis at the point of ingestion.
  • Performing historical analysis of stored data.
  • Leveraging machine learning.
  • Initiating an action or next step based on insights and analytics.
Inside AIOps

What are key AIOps use cases?

According to Gartner, there are five primary use cases for AIOps, which we’ll look at in depth:

  1. Big data management (volume, variety, variability and velocity)
  2. Performance analysis
  3. Anomaly detection
  4. Event correlation and analysis
  5. IT service management
Application of AI powered by Machine Learning
  1. Performance analysis: AIOps is a key use case for performance analysis, using AI and machine learning to rapidly gather and analyze vast amounts of event data to identify the root cause of an issue. A key IT function, performance analysis has become more complex as the volume and types of data have increased. It’s become increasingly difficult for IT professionals to analyze their data using traditional IT methods, even as those methods have incorporated machine learning technology. AIOps helps solve the problem of increasing volume and complexity of data by applying more sophisticated AI techniques to analyze bigger data sets. It can predict likely issues and quickly perform root-cause analysis, often preventing problems before they happen.
  2. Anomaly detection: Anomaly detection in IT (also "outlier detection") is the identification of data outliers — that is, events and activities in a data set that stand out enough from historical data to suggest a potential problem. These outliers are called anomalous events.

    Anomaly detection relies on algorithms. A trending algorithm monitors a single KPI by comparing its current behavior to its past. If the score grows anomalously large, the algorithm raises an alert. A cohesive algorithm looks at a group of KPIs expected to behave similarly and raises alerts if the behavior of one or more changes.

    AIOps makes anomaly detection faster and more effective. Once a behavior has been identified, AIOps can monitor the difference between the actual value of the KPI of interest versus what the machine learning model predicts, and watch for significant deviations.
  3. Event correlation and analysis: Event correlation and analysis is the ability to see through an “event storm” of multiple, related warnings to the underlying cause of events and a determination on how to fix it. The problem with traditional IT tools, however, is that they don’t provide insights into the problem, just the storm of warnings.

    AIOps uses AI algorithms to automatically group notable events based on their similarity. This reduces the burden on IT teams to manage events continuously and reduces unnecessary (and annoying) event traffic and noise. AIOps uses AI to group related events, focus on key event groups and perform rule-based actions such as consolidating duplicate events, suppressing alerts or closing notable events when an event is received.
  4. IT service management: IT service management (ITSM) is a general term for everything involved in designing, building, delivering, supporting and managing IT services within an organization. ITSM encompasses the policies, processes and procedures of delivering IT services to end users within an organization.

    AIOps provides benefits to ITSM in the same ways it helps other IT disciplines: by applying AI to data to identify issues and help fix them quickly, thereby helping IT departments be more efficient and effective. AIOps for ITSM can be applied to data from monitoring the IT service desk to devices and more.

    AIOps for ITSM can help IT departments to:
    • Manage infrastructure performance in a multicloud environment.
    • Make more accurate predictions for capacity planning.
    • Maximize storage resources by automatically adjusting capacity.
    • Improve resource utilization based on historical data and predictions.
    • Identify, predict and prevent IT service issues.
    • Manage connected devices across a network.
  5. Automation: Legacy tools often require manually cobbling information together from multiple sources before it’s possible to understand, troubleshoot and resolve incidents. AIOps provides a significant advantage in the ability to automatically collect and correlate data from multiple sources, greatly increasing speed and accuracy. The AIOps approach automates these functions across an organization’s IT operations, including:
    • Servers, OS and networks: Collect all logs, metrics, configurations, messages and traps to search, correlate, alert and report across multiple servers.
    • Containers: Collect, search and correlate container data with other infrastructure data for better service context, monitoring and reporting.
    • Cloud monitoring: Monitor performance, usage and availability of cloud infrastructure.
    • Virtualization monitoring: Gain visibility across the virtual stack, make faster event correlations, and search transactions spanning virtual and physical components.
    • Storage monitoring: Understand storage systems in context with corresponding app performance, server response times and virtualization overhead.

What are the key business benefits of AIOps?

By automating IT operations functions and using the power of AI to enhance and improve system performance, AIOps can provide significant business benefits to an organization. For example:

key business benefits of AIOps

By improving performance of IT infrastructure and applications, AIOps elevates KPIs that define business success.

  • Avoiding downtime improves customer satisfaction.
  • Bringing together data sources that had previously been siloed allows more complete analysis and insight.
  • Accelerating root-cause analysis and remediation saves time, money and resources.
  • Increasing response time and consistency of response improves service delivery.
  • Finding and fixing errors that would be tedious and time-consuming for people to address increases job satisfaction and lets IT teams focus on higher-value analysis and optimization.
  • Giving IT leadership more time to collaborate with business peers demonstrates the strategic value of the IT organization.

Many of the challenges of IT operations are common across all industries, and AIOps can help solve them. There are, however, issues that are more prevalent or more threatening in particular industries, including healthcare, retail, manufacturing and financial services.

How AIOps can be used in healthcare IT (HIT):

By automating IT operations functions and using the power of AI to enhance and improve system performance, AIOps can provide significant business benefits to an organization. For example:

  • Keeping electronic personal healthcare information (ePHI) safe in compliance with the Health Insurance Portability and Accountability Act (HIPAA).
  • Reducing the hazards of mobile networking and bring-your-own-device (BYOD) practices by medical professionals.
  • Preventing ransomware attacks, which disproportionately target healthcare organizations.
  • Making big data, both internal and external, available for research and diagnostic use.

How AIOps can be used in IT for retail:

  • Mobile point of sale (POS) and mobile payment in brick-and-mortar stores.
  • Syncing data across all retail channels and platforms including stores, mobile and desktop.
  • Securing customer data and information while making it available to create a personalized customer experience.
  • Ensuring a flexible infrastructure where it’s easy to add new technologies as the business grows and changes.
  • Maintaining effective operations while reducing cost in the face of financial pressure on the retail industry in general.
  • Maintaining the increasing number of connected devices in a retail store.
  • Implementing new smart technologies (e.g., smart home devices), augmented/virtual reality tools and checkout-free tools that let a customer pay by scanning a barcode with a smartphone app.

How AIOps can be used in IT for manufacturing:

  • Automating the collection and analysis of disparate data sources created by the integration of supply chain, plant operations and product and service life-cycle management.
  • Using real-time monitoring to track every machine on the factory floor, bringing together such data as manufacturing cycle times, quality yields by machine and production run, capacity utilization and supplier quality levels.
  • Preventing production slowdowns using historical data combined with AI-driven predictive analytics, thereby protecting revenue streams and increasing customer satisfaction.
  • Using machine data to enable predictive maintenance, fixing machines before they break.
  • Better utilizing data to create more efficient supply chain management systems.

How AIOps can be used in IT for financial services:

  • Preventing increasingly sophisticated security breaches and cybercrime.
  • Making customer data available to drive marketing and growth opportunities.
  • Analyzing historical customer data to create more accurate revenue growth predictions.
  • Ensuring data security and regulatory compliance.
  • Providing a framework for integrating multiple, large data sets to allow emerging technologies like blockchain.
  • Keeping up with consumer expectations of mobile and digital banking experiences.
  • Improving network speed and performance.

How do you choose the best AIOps tools and products?

As interest in AIOps has grown, some vendors are packaging traditional IT operations tools together, adding basic AI features and calling the result an AIOps “platform.” But a true AIOps platform isn’t just a collection of tools. This is important to understand as you get started, because the platform you choose will determine your success. Gartner recommends that enterprises “prioritize those vendors that allow for the deployment of data ingestion, storage and access, independent from the remaining AIOps components.”

Look at feature sets, and also review customer case studies and AIOps use cases. The easiest way to know if an AIOps platform will meet your needs is to find customer case studies that show how a company similar to yours applied AIOps to their business challenges. Look for vendors who showcase their customers online and ask for customer references. If an AIOps tool or platform promises great results but the company can’t provide evidence, that should be a clue to look elsewhere.

Getting Started

How do you get started with AIOps?

The best way to get started with AIOps is an incremental approach. Best practice is to start small by reorganizing your IT domains by data source. Learn how to work with large persistent data sets from a variety of sources. Let your IT operations team become familiar with the big data aspects of AIOps. Start with a data set of historical data, and gradually add new data sources as you improve your practice.

Focus on ingesting data first: Enabling AIOps requires access to all types of data: unstructured machine data and metrics, as well as relational data for enrichment. These different data types allow you to construct a holistic perspective across all silos and take actions meaningful to the situation and data type.

Ingesting and analyzing all of the data effectively and quickly can be daunting. Instead start with accessing and analyzing raw historical machine and metric data to establish a base understanding, and use clustering algorithms and analytics to identify trends and patterns. Raw data is the best type of data if you truly want real-time detection. Then you can begin to analyze streaming data to see how it fits those patterns, applying AI powered by machine learning to introduce automation and, eventually, predictive analytics.

Ingest and analyze as many data types as you can: Historical data is extremely valuable as you get started with AIOps. If you start by analyzing and understanding past states of your systems, you will be able to correlate what you learn with the present.

To achieve this, organizations must ingest and provide access to a vast range of historical and streaming data types. The data type that you select — be it log, metric, text, wire or social media data — depends on the problem you’re solving. For example, you can use metric data from your infrastructure to monitor capacity, or application logs to ensure that you are providing an outstanding experience to your customers.

Many AIOps platforms have historically only focused on a single data source. Restriction to a single data type limits your insights into system behavior — regardless of whether those insights come from an IT admin or an algorithm. Hence, enterprises should select those platforms that are capable of ingesting and analyzing data from multiple sources.

Don’t try to do it all at once: Focus on finding the root cause of your highest priority problem. Then progress to monitoring data. Only after this has been accomplished should AI be approached. Even then, take it step-by-step:

  • Start with implementing an AIOps platform that gives you both an effective foundation for organizing large volumes of data that make it easy to take action and monitoring capabilities that reveal patterns.
  • Next, explore the degree to which those patterns enable you to predict incidents and have a more proactive IT that allows you to decrease not only your MTTR but also the number of business-impacting incidents you face.
  • Finally, work with machine-learning-powered root-cause analysis to get to a predictive state in which you can determine the incident and its impact before it even affects your key business services and customer experience.
The bottom line: Go for it

If you’re an IT and networking professional, you’ve been told over and over that data is your company’s most important asset, and that big data will transform your world forever. AI is a revolution and it’s here to stay — and AIOps provides a concrete way to turn the hype about AI and big data into reality. From improving security to streamlining operations to increasing productivity, AIOps is a practical, readily available way to help you grow and scale your IT operations to meet future challenges, solidifying IT’s role as a strategic enabler of business growth.