Our fourth release of the Splunk Machine Learning Toolkit before Splunk .conf19 is ready to download. As always, we have you and your needs in mind to make machine learning-driven outcomes easier and more accessible for you.
New Content Highlights
Smart Forecasting Assistant
Previously, we released the brand new Smart Forecasting Assistant in MLTK 4.3 that supported univariate time series. In 4.4, this Assistant comes packed with many more features and capabilities. You can now forecast on multiple time series at the same time and the algorithm is capable of modeling the intrinsic dependency between the time series. Scroll down for a deep dive.
This algorithm is proving to be one of the most popular features that we have ever released as part of MLTK. We have listened to your feedback and this new release brings with it additional improvements. Now you'll get a warning message if any of the groups in your data have less than 50 data points, because no one wants to detect outliers using unreliable models. You are also able to generate sample data points from trained density functions both the normal and abnormal regions—just call full_sample.
Rule of Thumb to Split Data
MLTK’s Predict Numeric and Categorical Field Assistants previously set the train and test split ratio to 50-50. You asked us why it’s different from the common practice in machine learning which is a 70-30 or 80-20 split and we agreed. Now these Assistants will default to a 70-30 split. We strongly suggest you don’t just stick to this rule of thumb and instead experiment with different split ratios to see the effect on your model’s accuracy and pick the best one. Of course, cross validation is a more comprehensive way of assessing your model and the data you have at hand.
MLTK Joins the Dark Side
Starting in Splunk Enterprise 7.2, Splunk supports a dark theme for Dashboards. Now, MLTK custom visualizations also support the dark theme. Rest your eyes, join the dark side!
The Machine Learning Toolkit provides custom machine learning solutions that include machine learning specific Search Processing Language (SPL) commands, macros, visualizations, and guided modeling dashboards. Along with many blogs, videos, and an active MLTK user community, MLTK and ML-SPL Extensibility API offer detailed user documentation. New and improved features are always documented specifically, and listed with links in the evolving What’s New section. Version 4.4 of the documentation is revamped to make chapter and topic names more clear, and reorganized into a more user-friendly hierarchical order. Lost your favorite document bookmark or struggling to find a topic? Reach out to a MLTK support resource or use the “Send Feedback” or “Post a Comment” options at the bottom of any machine learning document to share your ideas with us directly. Your feedback helps keep our documents on point and valuable and we welcome your thoughts.
Smart Forecasting Assistant – Multivariate Support
Smart Forecasting Assistant now supports both univariate and multivariate forecasting. The multivariate workflow is the same as univariate in many ways but with key differences associated with the multivariate workflow highlighted below.
- Multiple fields can be selected for forecasting for the same time interval
- Visualization tab offers combined and split view of forecasted fields
- Set alerts conditions based on the fields selected for forecasting
Let’s run through one of the vertical examples we pre-loaded into MLTK showcase examples. I am forecasting App expenses using the CRM and ERP fields with three months of data.
We will move through the stages of Define, Learn, Review, and Operationalize to pull in data, build a model, and put that model into production.
Use the Define stage to select and preview the data you want to use for the forecast. In the multivariate workflow, select both CRM and ERP fields in the search bar.
The Visualization tab offers a combined view of the fields to forecast entered in the search.
Use the Learn stage to perform any preprocessing on your data, and to create your forecasting model. In this multivariate workflow, the Field to Forecast menu is multi-pick up to five fields and the list of fields is populated based on your data.
Select CRM, ERP in the Field to Forecast and set 12 and 30 days as Holdback and Future Timespan respectively.
Click on Forecast, see the results and Click Next.
Use the Review stage to assess the forecast based on your selections at the Learn stage. The Review panels give you the opportunity to assess your forecasting results prior to putting the model into production.
See the total number of chosen fields to forecast as well as those fields by name in their own drop-down. Choose to review the forecast charts in a combined or split view. You can toggle the confidence interval on or off per chart.
Set the Earliest Threshold Violations for the fields to forecast on one screen. Selected settings are immediately reflected in the chart results.
In addition to the combined and split chart view options you can also customize fields by which to review results from the View Fields drop-down menu.
The Operationalize stage provides publishing, alerting, and scheduled training in one place. In a multivariate workflow you can select Trigger Conditions based on one or more of the chosen fields to forecast.
For an in-depth look at how to use the MLTK, check out these webinars:
- Getting Started with Machine Learning
- Splunk's Machine Learning Toolkit: Technical Deep Dive and Demo Part 1
- Splunk's Machine Learning Toolkit: Technical Deep Dive and Demo Part 2
- Machine Learning in Action: Stop IT Events Before They Become Outages
Interested in trying out the Machine Learning Toolkit at your organization? Splunk offers FREE data science resources to help you get it up and running. Learn more about the Machine Learning Customer Advisory Program.