What’s New in the Splunk Machine Learning Toolkit 4.2

Another release of the Splunk Machine Learning Toolkit (MLTK) is hot off the press and ready for you to download (check out our "What's New in Splunk Machine Learning Toolkit Version 4.2" video on YouTube). This is our second release on the path to .conf19, and we have a lot of new customer-focused features and content specifically designed for making machine learning-driven outcomes accessible, usable, and valuable to you. This release is all about the top two most common outcomes you have asked for: easy-to-use numeric outlier detection and smart forecasts. We've also shipped a new version of Python for Scientific Computing (PSC) as we head towards a python 3.x PSC at .conf19 later this year.

New Content Highlights

Easy to use numeric outlier detection with the DensityFunction algorithm.
Do you want brilliant alerts with automatically machine learned thresholds? Maybe with the option to update and learn new behaviors as new data appears? Following the popular blog post I wrote a little over a year ago, "Cyclical Statistical Forecasts and Anomalies Part 1," the DensityFunction algorithm works out-of-the-box but with plenty of customization options if you want to get into the math. Here’s a quick preview of the DIY demo from later in this blog.

New smart forecasting option with the StateSpaceForecast algorithm.
Do you want to make brilliant machine learning forecasts with options like special days (Calendar Holidays, work-specific holidays, and other advanced features similar to "Cyclical Statistical Forecasts and Anomalies Part 2") with real-time apply (scaling your forecasts easily with the MLTK’s |fit and |apply context) and without specifying the underly math? Here’s a quick preview of the DIY demo from later in this blog.

3D Scatter Plot visualization now ships with the Splunk MLTK.
Already a popular visualization from Splunkbase used in many custom clustering workflows (for example Splunk Security Essentials for Fraud Detection), the MLTK version comes with enhancements based on common customer aks. Now users can look into their cluster formations in a 3D plot which supports zooming in, rotation option, screenshot capability and pinpointing the XYZ values of any point in the plot. Happy dashboarding!

MLTK’s Experiment Management Framework (EMF) alert options have been updated.
When creating an alert in the EMF workflow, you can still select from standard Trigger Conditions as before, but now new Machine Learning Conditions are available to customize your machine learning alerts for any use case.

Other New MLTK content for 4.2 includes:

  1. Introduction of the ICA algorithm for preprocessing
  2. The ML-SPL Performance App for Machine Learning Toolkit has been updated so you can estimate the impact of machine learning on your Splunk infrastructure with new algorithms we shipped in MLTK 4.2
  3. Splunk Cloud customers can now use GitHub to add more algorithms via the Splunk MLTK Algorithms on GitHub app. Splunk Cloud customers need to create a support ticket to have this app installed.
  4. Version 1.4 of the Python for Scientific Computing add-on is now available in Splunkbase and required to run certain new features including the DensityFunction algorithm for anomaly detection.
  5. View a complete list of what's new on Splunk Docs.

A Deeper Dive Into the New Features of Splunk MLTK 4.2

A quick note on installing any of the example simple xml dashboards: You'll want to set your MLTK app permissions to global, go to Searching and Reporting or another app in your Splunk instance (not the MLTK) and create a new dashboard. On the new dashboard, click edit and open the source page by clicking the source button. Copy and paste the file contents from the linked files on this blog into your source, and click save. You should see a populated version of the demos we are about to go through! You may have to manually run each panel to ensure the models are correctly built depending on your individual splunk settings.

A new blog version of "Cyclical Statistical Forecasts and Anomalies - Part 1" with DensityFunction and Statespace is underway—expect to see it soon!

Easy to use numeric outlier detection with the DensityFunction algorithm

(Get the simple XML file here.)

I want to get machine-learned outliers for numeric values I care about quickly so that I can create alerts on what really matters and take meaningful action. Let’s walk through the demo of using DensityFunction to learn what's normal and what's an outlier across many potential densities in our data through machine learning.

In the first section of our demo XML file, you can see two different densities from the same set of data for SMART (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) hard drive metrics. Panel #1 is for for hard drive model ST4000DM000 and Panel is a different density for a hard drive model ST3000DM001. These hard drive's behave in different ways—as shown by their different densities—but machine learning can pick out the differences and automatically suggest outliers and thresholds for the different data (Panels 2 and 4).

Panels 5 through 7 show the details of running DensityFunction over larger sets of data—in this case, each SMART metric data event for a bunch of different hard drive models. Each hard drive model’s behavior is learned from the |fit call in Panel 5 (note the “by” clause added for the first time with |fit!). In Panel 6 we inspect the learned model and see that each of the hard drive behaviors have been categorized by machine learning using kernel density estimation, and while you can of course force DensityFunction to assume all normal curves and change settings like threshold percentage, you can easily use Splunk’s machine learning to quickly learn what is normal and not using the auto default setting to learn each density separately.

We finish with Panel 7, where we can apply the model file and change the threshold setting to a some new percentage without retraining the model. You can even alert directly off the new field “isOutlier” and if you desire add the strength of the anomaly as an additional field with show_density=true option! As this is an |apply step, you can use this search in real time. Now I can save my real-time alerts based on numeric outlier detection in my SMART metrics for each hard drive model and take meaningful action to address those hard drives that could be starting to fail.

There are plenty of DensityFunction updates coming in later versions of the Splunk MLTK. Thinking about the awesome ways you can view different densities? Don’t worry—we have your covered in a future MLTK release with new viz options! Can’t wait!

New smart forecasting option with the StateSpaceForecast algorithm

(Get the simple XML file here.)

For this demo I want to forecast into the future a typical 7-day business cycle so I can take preventative action based on predictive analytics.

In Panel 1, I show a typical time series in Splunk over five weeks or so of past data; each time you load the demo, a new data set will be generated. Don’t let the scary long SPL in the search chase you off—it’s just generating a random time series each time the dashboard runs so that you can easily interact with data ending in “now.” The only lines that matter are the |fit and |apply.

In Panel 2 I use the |predict command—a core part of  the SPL that comes with every copy of Splunk used for quick forecasting. We can see that the predict command projects a forecast into the future, but not a very detailed one.

In Panel 3 I use the new StateSpace algorithm from the Splunk MLTK to detect a time series pattern in our historical data and project a more detailed forecast into the future. I can inspect the model created to see what pattern (period) was the strongest, and can easily overwrite that period in Panel 5 to force my starting business concern—a 7-day cycle (5 minutes per time span means a period of 2016). If I want to I can even include a special days field to tell StateSpace about contextual periods of time that should be treated differently (like Black Friday sales or IP traffic on July 4th).

The options for Statespace continue. I can forecast multiple time series into the future together as a unified system (where the interactions between each time series periodicity are taken into account) as seen in Panel 6, or I can even forecast into the future from a single event with |apply as seen in Panel 7, opening up real-time forecasting as an option in Splunk!

Learn how Splunk customers are using the Machine Learning Toolkit to generate benefits for their organizations, including Hyatt, the University of Nevada, Las Vegas (UNLV), and Transunion.

Interested in trying out the Machine Learning Toolkit at your organization? Splunk offers FREE data science resources to help you get it up and running. Learn more about the Machine Learning Customer Advisory Program.

Andrew Stein

Posted by