Levelling up your ITSI Deployment using Machine Learning

Here at Splunk we’re passionate about helping our customers get as much value from their data as possible. Recently Lila Fridley has written about how to select the best workflow for applying machine learning and Vinay Sridhar has provided an example of anomaly detection in SMLE. Here we’d like to build on that content by providing some details about the Smart ITSI Insights App for Splunk, which is designed to help IT operations teams gain additional insights from ITSI using machine learning - all without having to be a data scientist!

I often get asked how we can help our customers extract the most value from their IT Service Intelligence (ITSI) deployments, and in this blog series, I wanted to present a number of techniques that have been used to get the most out of ITSI using machine learning.

Most of these techniques are wrapped up as repeatable content in the Smart ITSI Insights app for Splunk. I’d encourage you to check the app out and test the capabilities yourself as you read the blogs linked below.

Can I predict potential outages?

Many of you will be familiar with the predictive analytics in ITSI, which is described in detail here. While this can be a powerful capability, we often hear from customers who are unsure which algorithm to apply or appear to have unpredictable relationships between the service they want to predict and the KPIs that are used to generate the service health score.

For these reasons, we have been working on a new workflow for generating the predictions in ITSI. This workflow allows users to inspect the service health score and KPI relationships, as well as running statistical analysis to determine if there is a good degree of correlation in the data. This correlation is really important – strongly coupled data makes for a good prediction accuracy!

I will talk through this in more detail in the blog about making smarter predictions in ITSI.

ITSI predictive models

What intelligent analytics can I apply to group my alerts?

While ITSI has an awesome way of grouping alerts using machine learning using Smart Mode, many customers would like a similar approach that gives them more flexibility in how to define an episode. Currently, Smart Mode defines not just the patterns in the data, but the episode aggregation policies too.

Graph analytics is something we have been talking about with increasing frequency at Splunk, and for ITSI it presents a great way of creating ‘smart’ episodes through the use of unsupervised community detection. We talk about this more in the Smarter ITSI Episodes Powered by Community Detection Algorithms blog.

ITSI Graph Visualization

How do I identify root cause from an alert?

ITSI has some awesome ways of understanding root cause through episode reviews, deep dive analysis and even the service analyser. More recently we have been doing some work around causal inference – a technique to identify causal relationships between data points – and in the blog on Smarter Root Cause Analysis: Determining Causality from your ITSI KPIs we outline how you can use causal inference to identify root cause from your KPIs.

How can I spot unusual patterns of alerts?

The final topic I will be covering in this series is around how to spot unusual activity in your environment.

Alerts and episodes are great for identifying known patterns of behaviour, such as poor network latency or a hard drive filling up, but they can often struggle with flagging truly unusual patterns of alerts that are generated across the environment. In the final blog post (Smarter Noise Reduction in ITSI) we will be walking through how you can identify unusual event storms through anomaly detection and text analysis.

Summary

Hopefully you will be able to gain some additional insight from your ITSI deployment using the Smart ITSI Insights app for Splunk and some of the content in this blog series. Keep an eye out for future blogs detailing how you can use SMLE to further improve some of the techniques we’ve outlined here.

For now it’s over to you to keep your IT systems ticking over smoothly with machine learning!

Happy Splunking!

Related Articles

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease
Platform
2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.
Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights
Platform
3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.
Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ
Platform
5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.
Dashboard Studio: Token Eval and Conditional Panel Visibility
Platform
4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.
Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard
Platform
4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.
Powering AI Innovation with Splunk: Meet the Cisco Data Fabric
Platform
3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.
Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades
Platform
3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.
Dashboard Studio: Spec-TAB-ular Updates
Platform
3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!
Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises
Platform
2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.