We’ve just rolled out an updated version of the Splunk Machine Learning Toolkit 3.2, which builds on the capabilities we’ve delivered in the toolkit since the 3.0 release back in October 2017. Updates in version 3.1 and 3.2 provide enhanced pre-processing and customization options, as well as improved model building and management.
Here’s a summary of what you can look forward to.
Increase Accuracy of ML Models with Improved Pre-processing Options
We’ve added a field selector as a new pre-processing option to help you select the best fields for your target metric’s prediction, which is particularly useful when you have no prior knowledge about the relation between the metrics in your data. Here, you’re actually applying the pre-processing machine learning algorithm to help you determine which fields would be best suited for your prediction vs. having to do that analysis manually.
Make Sense of Unlabeled Data
We’ve added X-means, a new clustering algorithm that is an extended version of K-means, which tries to automatically determine the number of clusters based on Bayesian Information Criterion (BIC) scores.
X-means algorithm is useful when you have unlabeled data and no prior knowledge of the total number of labels into which the data may be divided.
Previously, in the MLTK, if you loaded a categorical field with more than 100 distinct values, they would get ignored due to system and performance constraints. We now provide you with the ability to configure the default categorical encoding limit which allows you to override the default number (100) of unique categorical values for classification. Hence, now when you have a classification problem with over 100 categorical values, you’ll be able to use all of them vs. being limited to the first 100 fields.
Simplified & Unified Model Building and Management via Experiment Management Framework (EMF)
Those of you that are familiar with the MLTK know that the guided UI (known as assistants) makes it easier to create ML models. With the introduction of the Experiment Management Framework, it’s now easier to view, control, share and monitor the status of your machine learning experiments. You’re able to see which scheduler or alert is assigned to which model, which models have gone through pre-processing and to determine filtering, based on which characteristics of the models are missing. These steps are now more intuitive and are no longer separate. We’ve brought together all the relevant pieces (alerts, models, schedulers) to form a single, unified entity called “Experiment” which you can easily maintain from a single page.
This unified UI allows you to set role-based access controls on experiments, browse and filter pre-built models, monitor and schedule alerts and searches and get statistics about experiments previously run.
With the new Experiment Management Framework, MLTK now gives you the ability to build and manage multiple machine learning experiments, record when those experiments ran and what the results were upon operationalizing the experiments.
To see these updates in action, check out the video below.
Interested in trying out the Machine Learning Toolkit at your organization? Splunk offers FREE data science resources to help you get it up and running. Learn more about the Machine Learning Customer Advisory Program.