Threat Hunting With ML: Another Reason to SMLE

Security is an essential part of any modern IT foundation, whether in smaller shops or at enterprise-scale. It used to be sufficient to implement rules-based software to defend against malicious actors, but those malicious actors are not standing still. Just as every aspect of IT has become more sophisticated, attackers have continued to innovate as well. Building more and more rules-based software to detect security events means you are always one step behind in an unsustainable fight. In order to stay on top — and ahead — of the latest security threats, we have to change the game.  

At Splunk, our Threat Researchers are leveraging and implementing machine learning (ML) techniques across our security detections to stay ahead of bad actors and better protect our customers. While rules-based detection software remains an important part of any defense strategy, ML and behavioral-based detections allow Splunk Threat Researchers to anticipate patterns and defend against a broader range of increasingly sophisticated attacks.

This blog is the first in a mini-series of blogs where we aim to explore and share various aspects of our security team’s mindset and learnings. In this post, we will introduce you to how our own Security and Threat Research team at Splunk develops the latest security detections with ML using the Splunk Machine Learning Environment (SMLE). Since our announcements at .conf20, there has been tremendous excitement about SMLE and our Streaming ML capabilities. This excitement was shared by Splunk’s Threat Research team, as we saw a significant opportunity to leverage both SMLE and Streaming ML while advancing our next-generation and behavioral-based detections. Let’s walk through an example.

Improving Security Detections with ML

A large portion of today’s enterprise-class security offerings are powered using what we call detections. Detections are the individual components that identify security threats or anomalies, and in the Splunk world, these detections have traditionally consisted of SPL code. SMLE makes it easy to extend traditional, signature-based security detections to find behavioral patterns by leveraging our Streaming ML capabilities as operators, right inline with SPL code. In fact, we use SMLE to author the detections that go into our very own security products.

In this example, we are in the process of building a signature detection in our SMLE Studio notebook to find Credential Dumping via System Account Manager (SAM), described as technique T1003.002 by MITRE ATT&CK. OS Credential Dumping is a technique typically used by threat actors to move laterally by obtaining credentials from a compromised system.  SMLE Studio is our native Jupyter notebooks environment where you can train custom ML models, experiment with built-in Streaming ML capabilities, or build sophisticated SPL pipelines right in the Splunk ecosystem. It contains all of the power of Jupyter notebooks, plus the ability to author Python and R code right next to SPL code. This is hugely valuable for developing security detections, because we can easily run experiments to improve our existing detection library with rapid iteration cycles. 

Signature-based detection in SMLE Studio

This detection begins by reading a dataset and then casting it into an object for sysmon. We then extract the process name out of this dataset and also match that process name with any keyword like cmd.exe or reg.exe. Next, we extract the command line arguments from the logs and try to match if it has the keyword “save” in it, and ultimately match suspicious registry keys as shown in the regex commands. Specifically we are looking for a selection of registry keys that the attacker can try to use to obtain credentials from SAM.  

The next step is to test this against some sample data to determine if the detection is working.  In this example, we are reading off of a dataset (from Splunk Attack Data) that we know the attack is present in. In the results, we see that an attempted credential dump was done on this dataset via registry key HKLM/SAM. 

Signature-based detection results in SMLE Studio

Let’s see if we can make this even better by turning this signature-based detection into a behavioral detection leveraging SMLE’s built-in Streaming ML capabilities. The new detection is very similar to the first detection, but we have tweaked it to look for any new command line arguments passed to cmd.exe by using the “first time” Streaming ML algorithm. This is unique in the industry because we are simplifying the complexities of keeping state inside an algorithm within one simple command! 

Behavioral-based detection in SMLE Studio

Comparing the results, we see other attacks in the dataset that the traditional signature detection missed. For example, the behavioral detection surfaced the execution of pypykatz, another tool used to obtain credentials from SAM. 

Behavioral-based detection results in SMLE Studio

Now that we have validated our ML-powered detection, it is ready for use. We can also go back and fine-tune the results to exclude any additional noise. With a few simple changes to our existing rules-based detection, SMLE Studio with Streaming ML enabled us to build a more complete behavioral detection that scales beyond any set of pre-determined rules.


To some, it may seem strange — or even counter-intuitive — that Splunk is so transparent with our process for developing advanced security detections. In fact, we believe in developing these defenses in an open ecosystem to share with our customers and the community. When a new vulnerability, threat, or piece of malicious software is identified, our team quickly mobilizes to release a new detection that benefits not only our customers, but the entire security community. You can find all of the detections developed by the Splunk Threat Research team (along with the attack datasets used to test these detections) on the Splunk Security Content GitHub repository.

Developing security use cases can be a real challenge. By combining the power of SPL with the capabilities of Streaming ML, SMLE unlocks a new set of opportunities for building robust security detections, and has proved to be a useful tool in our own Threat Research Team.

Next Steps

Want to keep up with the latest from the Splunk Security and Threat Research team?  Check out the Splunk Security Content repo on GitHub. With 46 authors pushing 322 commits in the last month alone, it is a thriving and active community. 

Stay tuned for the next post in this mini-series of blogs about our security and threat research at Splunk!


John Reed
Posted by

John Reed

John Reed is a Principal Product Manager at Splunk. His responsibility includes the strategy and execution of initiatives across Machine Learning and Core Search.

Previously, John was a Product Manager at AWS where he worked across the AI/ML service portfolio. Prior to AWS, John was a Product Manager at vXchnge and NetApp, where he focused on cloud initiatives, interconnectivity, file services, and more.

John received a Bachelor of Science degree in Mechanical Engineering from the University of California, Berkeley.