Levelling up your ITSI Deployment using Machine Learning

Platform February 10, 2021 Greg Ainslie-Malik

Here at Splunk we’re passionate about helping our customers get as much value from their data as possible. Recently Lila Fridley has written about how to select the best workflow for applying machine learning and Vinay Sridhar has provided an example of anomaly detection in SMLE. Here we’d like to build on that content by providing some details about the Smart ITSI Insights App for Splunk, which is designed to help IT operations teams gain additional insights from ITSI using machine learning - all without having to be a data scientist!

I often get asked how we can help our customers extract the most value from their IT Service Intelligence (ITSI) deployments, and in this blog series, I wanted to present a number of techniques that have been used to get the most out of ITSI using machine learning.

Most of these techniques are wrapped up as repeatable content in the Smart ITSI Insights app for Splunk. I’d encourage you to check the app out and test the capabilities yourself as you read the blogs linked below.

Can I predict potential outages?

Many of you will be familiar with the predictive analytics in ITSI, which is described in detail here. While this can be a powerful capability, we often hear from customers who are unsure which algorithm to apply or appear to have unpredictable relationships between the service they want to predict and the KPIs that are used to generate the service health score.

For these reasons, we have been working on a new workflow for generating the predictions in ITSI. This workflow allows users to inspect the service health score and KPI relationships, as well as running statistical analysis to determine if there is a good degree of correlation in the data. This correlation is really important – strongly coupled data makes for a good prediction accuracy!

I will talk through this in more detail in the blog about making smarter predictions in ITSI.

ITSI predictive models

What intelligent analytics can I apply to group my alerts?

While ITSI has an awesome way of grouping alerts using machine learning using Smart Mode, many customers would like a similar approach that gives them more flexibility in how to define an episode. Currently, Smart Mode defines not just the patterns in the data, but the episode aggregation policies too.

Graph analytics is something we have been talking about with increasing frequency at Splunk, and for ITSI it presents a great way of creating ‘smart’ episodes through the use of unsupervised community detection. We talk about this more in the Smarter ITSI Episodes Powered by Community Detection Algorithms blog.

ITSI Graph Visualization

How do I identify root cause from an alert?

ITSI has some awesome ways of understanding root cause through episode reviews, deep dive analysis and even the service analyser. More recently we have been doing some work around causal inference – a technique to identify causal relationships between data points – and in the blog on Smarter Root Cause Analysis: Determining Causality from your ITSI KPIs we outline how you can use causal inference to identify root cause from your KPIs.

How can I spot unusual patterns of alerts?

The final topic I will be covering in this series is around how to spot unusual activity in your environment.

Alerts and episodes are great for identifying known patterns of behaviour, such as poor network latency or a hard drive filling up, but they can often struggle with flagging truly unusual patterns of alerts that are generated across the environment. In the final blog post (Smarter Noise Reduction in ITSI) we will be walking through how you can identify unusual event storms through anomaly detection and text analysis.

Summary

Hopefully you will be able to gain some additional insight from your ITSI deployment using the Smart ITSI Insights app for Splunk and some of the content in this blog series. Keep an eye out for future blogs detailing how you can use SMLE to further improve some of the techniques we’ve outlined here.

For now it’s over to you to keep your IT systems ticking over smoothly with machine learning!

Happy Splunking!

Style

two-column

Unleashing Data Ingestion from Apache Kafka

Platform

2 Minute Read

Unleashing Data Ingestion from Apache Kafka

Splunk Connect for Kafka introduces a scalable approach to tap into the growing volume of data flowing into Apache Kafka

Platform

4 Minute Read

Detecting Credit Card Fraud Using SMLE

In this blog post, we’ll explore an ML-powered solution using the Splunk Machine Learning Environment to detect fraudulent credit card transactions in real time. Using out-of-the-box Splunk capabilities, we’ll walk you through how to ingest and transform log data, train a predictive model using open source algorithms, and predict fraud in real-time against transaction events.