We’ve just rolled out an updated version of the Splunk Machine Learning Toolkit (MLTK) 3.4, which features updates to the core machine learning libraries, new functionalities in the experiment management framework, more visualization and for the first time, we are introducing a neural network algorithm out of the box! This all builds on the capabilities we’ve delivered in the toolkit since the Splunk Machine Learning Toolkit 3.3 release in June 2018.
Below are key features in MLTK 3.4, which we will review in further detail in this blog post. You can also check out a video of them in action here.
Python for Scientific Computing (PSC) 1.3
Experiment Management Framework (EMF): Publishing
New Visualization Option: Box Plot
Neural Network Algorithm: MLP Classifier
Supported Python for Scientific Computing Add-on Libraries Upgraded
What exactly is upgraded?
With the Python for Scientific Computing (PSC) 1.3 update, we have updated various libraries such as numpy, pandas, scikit-learn, statsmodels, scipy, etc.
What does Python for Scientific Computing (PSC) 1.3 support in the MLTK 3.4+?
These library updates enhance the Machine Learning Toolkit’s existing capabilities, like the inclusion of the Multi-layer Perceptron Classifier with support for more updated content via ML-SPL API. New functionality from these updates will be coming to the MLTK future releases at .conf18, the 9th Annual Splunk Users' Conference, and beyond.
Please Note: You must have PSC 1.3 for MLTK 3.4+.
Experiment Management Framework Continues to Evolve in Splunk MLTK
What is it?
With this new publishing feature, customers can now publish machine learning models from the Experiment Management Framework (EMF) into the app context of Splunk IT Service Intelligence, Splunk Essentials, or any other another SPL-based solution built on Splunk Enterprise.
What's the use case?
Today, a machine learning model creator (citizen data scientist) can create an EMF workflow in MLTK to easily and automatically monitor their model lifecycles. Publishing gives the model owner the new capability to ship those models to another Splunk user's workspace, so that user can quickly get value in their normal workflow (for example ML-powered custom alerts in any SPL-based workflow in Splunk.)
More Visualization Options
We have added a new visualization option called Box Plot.
What is it and what’s the use case?
Box Plot is a classic visualization for quickly investigating and measuring probability density functions to understand a data set's basic static profile. Any time you are using statistical analysis in Splunk, you can use the Box Plot to visualize the different distributions. In the below example, we are plotting density of calls per hour, per day of the week over 5 weeks. We can see that the number of calls have a different distribution on each week day but there are some days that are similar—Friday, Tuesday, Wednesday, and Saturday have an overall similar profile, with Thursday being similar but a little different. Monday and Sunday are very different from other days of the week, and are different from each other too!
Neural Network Algorithm
MLP Classifier: Multi-layer Perceptron Classifier (MLPClassifier) is a supervised learning algorithm based on a forward feed neural network.
What does it do?
Classification problems are hard, and the MLPClassifier gives you a leg up with its ability to distinguish non-linear relationships in the data. Plenty of powerful parameters are available to adjust and customize the learning process!
Note: MLPClassifier is sensitive to feature scaling, so don't forget to standardize your data before you start using it for your model training!
If we take an example from the showcase, “Predicting Hard Drive Failure,” we can compare the LogisticRegression and MLPClassifier algorithms outcomes via a confusion matrix like below.
MLPClassifier was able to discover non-linear relationships in the features predicting hard drive failure and build a more accurate model.
For an in-depth look at how to use the MLTK, check out these webinars:
Interested in trying out the Machine Learning Toolkit at your organization? Splunk offers FREE data science resources to help you get it up and running. Learn more about the Machine Learning Customer Advisory Program.