Building Machine Learning Models with DensityFunction

We recently released the Splunk Machine Learning Toolkit (MLTK) version 5.2 and earlier this month we outlined how the release of version 5.2 will make machine learning more accessible to more users. Splunk’s MLTK lets our customers apply machine learning to data they are already capturing in Splunk (in the cloud or on-premises), develop models, and operationalize algorithms to glean new insights and make more informed decisions. This version of the MLTK was purpose-built to help citizen developers and data scientists hone in on common use cases such as forecasting, anomaly detection, and clustering. However, we recognize there are many users with a deeper background in ML, so we have also incorporated some new, advanced capabilities in this release to suit their needs as well.

MLTK 5.2 provides more precise algorithms than “off the shelf” algorithms, which are unable to consume and process large volumes of data. Additionally, unlike open source algorithms, the MLTK bespoke algorithms are able to provide better results with partial datasets. This enables our customers to achieve more effective answers as they go about creating references, predicting data clusters, and detecting outliers.

Based on feedback from our customers and their use cases, MLTK 5.2 includes both new and enhanced algorithms to utilize for quick analysis of outliers. While there are many different ways to approach anomaly detection, the easiest starter algorithm for most customers using MLTK is DensityFunction.

MLTK 5.2 introduces two improvements to the DensityFunction algorithm:

  1. The ability to use the partial_fit parameter for incremental learning on large datasets
  2. Support of the data distribution of Beta which is a distribution that supports five different data shapes

These enhancements will help our users find and take action on the root cause of an issue more quickly.

We commonly hear from customers that they struggle to improve the performance of models when they have a large number of groups that need to be split. The DensityFunction algorithm is a great tool for developing outlier detection models and can help customers think critically about their data. While one could simply run DensityFunction such as:

         | fit DensityFunction logins INTO myPdfModelGlobal 
                AS IsOutlierGlobal

 … | fit DensityFunction logins by “HourOfDay,DayOfWeek” 
                INTO myPdfModelHourly AS IsOutlierHourly

      | fit DensityFunction logins by “weekday” INTO myPdfModelWeekday 
                AS IsOutlierWeekday

      | fit DensityFunction logins by “account_type” INTO myPdfModelAccountType
                AS IsOutlierAccountType

        | fit DensityFunction logins by “user_group” INTO myPdfModelUserGroup 
                AS IsOutlierUserGroup

The above example is taking a single indicator of logins and pivoting it four different ways:

With these different splits, we now have more context around why certain values could be considered outliers, potentially detecting if we have a true anomaly. Similarly, some customers may be monitoring a very large number of servers, possibly in the thousands, in which case the algorithm could be used to split by server/user.

The Splunk Machine Learning Toolkit delivers the capability to operationalize machine learning models on your data in Splunk and the DensityFunction algorithm can be a simple starting point for users to harness the power of machine learning. In fact, customers have already started successfully using this approach in production with great results.

----------------------------------------------------
Thanks!
Mohan Rajagopalan

Related Articles

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease
Platform
2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.
Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights
Platform
3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.
Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ
Platform
5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.
Dashboard Studio: Token Eval and Conditional Panel Visibility
Platform
4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.
Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard
Platform
4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.
Powering AI Innovation with Splunk: Meet the Cisco Data Fabric
Platform
3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.
Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades
Platform
3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.
Dashboard Studio: Spec-TAB-ular Updates
Platform
3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!
Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises
Platform
2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.