Online Learning: a Novel Approach to Applying Machine Learning in Splunk

Most classical, batch-oriented machine learning systems follow the paradigm of “fit and apply”. In an earlier blog post, I discussed a few patterns on how to better organize data pipelines and machine learning workflows in Splunk. In this blog, we’ll review how you can organize your machine learning model in a new way: online learning.

Batch Learning Vs. Online Learning

The difference between batch learning and online learning systems is that in the first approach, you attempt to learn from a whole dataset at once and in the latter, you take incremental steps and constantly update your model “online”. In practical applications, there are pros and cons for each, making it hard to decide which approach is more suitable.

The main advantage of an online learning system is the typically lower compute and memory footprint because you don’t have to process a large dataset as is the case in traditional batch learning. Although it could potentially be costly to perform batch data processing and take time to train the model, you can continuously feed smaller batches of data incrementally to the online learner and get faster responses. The system learns from the batches and memorizes the important characteristics in its model representation while continuing to apply them to make inferences about the data presented. Additionally, as soon as new data points arrive, the model can adapt to new situations, and therefore keep learning.

With all those advantages, please keep in mind that there are also challenges that you need to consider - the model should be able to handle concept drift, which can occur when data changes significantly. Additionally, if you only have the online model, but no longer the historical data, it is difficult to meaningfully retrain the model if something goes wrong in your data or the online algorithm of choice. In production-grade systems, you ideally have strategies in place to deal with such situations, especially if you rely on an online learning system for business-critical applications. Nevertheless, this approach is still a viable tool in your belt to consider for your use case.

Example of an Online Learning Anomaly Detector

Since version 3.8 the Splunk App for Science and Deep Learning (DSDL), formerly known as the Deep Learning Toolkit (DLTK) allows you to tap into online learning algorithms powered by the River Python library with a dedicated container image and an example for an online learning anomaly detector based on the HalfSpaceTrees algorithm, an online variant of isolation forests. They work well when anomalies are spread out.

Online Learning Anomaly Detection

In the screenshot above, you can see a simple time series of the access count to a Recruiting Service, represented by the blue bars. In the line chart overlay, you can see the green line indicating an anomaly score, which is calculated by the online learning model. On the left side of the chart, you’ll notice that the score appears after a certain defined warm-up phase which is quite typical for online learners. If you follow the green line even more closely, you can also see how, after a while, the learner adjusts from an average value of 0.40 to a lower value stabilizing around 0.25 on the right end of the chart. Finally, the orange line indicates the flagged anomalies based on a threshold that can be easily adjusted based on the desired sensitivity of the detector. That’s how the 11 anomalies are automatically spotted and could now very easily be used for alerting purposes or more sophisticated correlation searches.

Online Learning Workflow with Splunk and DSDL

To conclude this online learning example, let’s look at what a practical workflow in Splunk would look like. Typically you would take on the following steps to get your online learning system up and running in DSDL:

  1. Identify the appropriate algorithm in River and implement it as a DSDL Jupyter Notebook, e.g. like the existing river_halfspacetree.ipynb example.
  2. Create an initial model of your online learner with a search that contains your base search and … | fit MLTKContainer algo=river_halfspacetree window_size=100 n_trees=10 height=3 Recruiting into app:online_anomaly_detector … and has access to some data that works well with your algorithm of choice
  3. Now as your model named “online_anomaly_detector” exists, you can launch a dedicated container for this model to be served uniquely for your use case.
  4. Define a search that contains … | apply online_anomaly_detector … and run it on the desired schedule, e.g. every 5 minutes on the last 5 minutes of new data. The existing online learner does inference on the new data and subsequently learns from its characteristics and updates itself.
  5. Depending on how you want to make your results actionable, you can decide e.g. to alert on the anomalies directly or write them into a summary index for further consumption on a dashboard or subsequent correlation searches.
  6. Optionally, add any additional logging or scoring to improve your machine learning operations and keep track of your model health and performance.

I hope this blog post provides you with a novel approach on some of your machine learning challenges. Please note that not all algorithms are equally suited for online learning purposes, so you should carefully evaluate use cases and compare possible online learning approaches with other traditional batch learning approaches to make an informed decision on what is a better fit.

If you are looking to learn more about the Splunk App for Data Science and Deep Learning, you can watch this .conf session to explore how BMW Group is using DSDL for a predictive testing strategy in automotive manufacturing. In case you are interested in how to use DSDL to scale out forecasting with prophet, stay tuned for another blog post coming soon.

Happy online learning,

Philipp

Many thanks to Judith Silverberg-Rajna, Katia Arteaga and Mina Wu for your support in editing and publishing this blog post.

Related Articles

How Splunk is Helping Shape the Future of Higher Education IT by Tackling EDUCAUSE 2026 Top Issues
Industries
3 Minute Read

How Splunk is Helping Shape the Future of Higher Education IT by Tackling EDUCAUSE 2026 Top Issues

Dive into how Splunk aligns with key priorities highlighted at EDUCAUSE 2025.
Enhancing Government Resilience: How AI and Automation Empower Public Sector Missions
Industries
3 Minute Read

Enhancing Government Resilience: How AI and Automation Empower Public Sector Missions

Splunk helps government agencies boost security and efficiency with powerful, mission-ready AI and automation.
Solving Manual Mayhem in Telecom with Agentic AI
Industries
3 Minute Read

Solving Manual Mayhem in Telecom with Agentic AI

Agentic AI cuts downtime, improves security, and boosts customer experience, and with unified data from Splunk and Cisco, teams can build more resilient operations.
Upgrading to Splunk Enterprise 10.0 and Splunk Cloud Platform 10.0: Key Resources for Public Sector Customers
Industries
2 Minute Read

Upgrading to Splunk Enterprise 10.0 and Splunk Cloud Platform 10.0: Key Resources for Public Sector Customers

Splunk Enterprise 10.0 and Splunk Cloud Platform 10.0 deliver the most secure, stable, and modernized platform for a digitally resilient and compliance-ready future.
Building the Next Generation of Defenders: From the Classroom to the SOC of the Future
Industries
3 Minute Read

Building the Next Generation of Defenders: From the Classroom to the SOC of the Future

Resilience in the AI era doesn’t just happen – it's built one student, one SOC, and one organisation at a time.
Analytics That Work: 3 Approaches for the Future of Contact Centers
Industries
3 Minute Read

Analytics That Work: 3 Approaches for the Future of Contact Centers

Splunker Khalid Ali explains how unified, real-time intelligence connects data, empowers agents, and builds lasting customer loyalty.
Observability + Security: Real-Time Digital Resilience for SLED
Industries
1 Minute Read

Observability + Security: Real-Time Digital Resilience for SLED

Cisco and Splunk are helping public sector organizations build digital resilience.
Digital Resilience for State and Local Governments (Part Two)
Industries
3 Minute Read

Digital Resilience for State and Local Governments (Part Two)

Discover how collaboration—powered by shared data platforms like Splunk—can enhance incident response and overall digital resilience.
Reflections from SIBOS 2025: How will advances in technology (and especially AI) change the financial services industry over the next 5 years?
Industries
2 Minute Read

Reflections from SIBOS 2025: How will advances in technology (and especially AI) change the financial services industry over the next 5 years?

Discover key insights from SIBOS 2025 on how AI, collaboration, and data will reshape financial services over the next 5 years—prepare for rapid change and exciting opportunities ahead.