Threat Hunting With ML: Another Reason to SMLE

Security is an essential part of any modern IT foundation, whether in smaller shops or at enterprise-scale. It used to be sufficient to implement rules-based software to defend against malicious actors, but those malicious actors are not standing still. Just as every aspect of IT has become more sophisticated, attackers have continued to innovate as well. Building more and more rules-based software to detect security events means you are always one step behind in an unsustainable fight. In order to stay on top — and ahead — of the latest security threats, we have to change the game.

At Splunk, our Threat Researchers are leveraging and implementing machine learning (ML) techniques across our security detections to stay ahead of bad actors and better protect our customers. While rules-based detection software remains an important part of any defense strategy, ML and behavioral-based detections allow Splunk Threat Researchers to anticipate patterns and defend against a broader range of increasingly sophisticated attacks.

This blog is the first in a mini-series of blogs where we aim to explore and share various aspects of our security team’s mindset and learnings. In this post, we will introduce you to how our own Security and Threat Research team at Splunk develops the latest security detections with ML using the Splunk Machine Learning Environment (SMLE). Since our announcements at .conf20, there has been tremendous excitement about SMLE and our Streaming ML capabilities. This excitement was shared by Splunk’s Threat Research team, as we saw a significant opportunity to leverage both SMLE and Streaming ML while advancing our next-generation and behavioral-based detections. Let’s walk through an example.

Improving Security Detections with ML

A large portion of today’s enterprise-class security offerings are powered using what we call detections. Detections are the individual components that identify security threats or anomalies, and in the Splunk world, these detections have traditionally consisted of SPL code. SMLE makes it easy to extend traditional, signature-based security detections to find behavioral patterns by leveraging our Streaming ML capabilities as operators, right inline with SPL code. In fact, we use SMLE to author the detections that go into our very own security products.

In this example, we are in the process of building a signature detection in our SMLE Studio notebook to find Credential Dumping via System Account Manager (SAM), described as technique T1003.002 by MITRE ATT&CK. OS Credential Dumping is a technique typically used by threat actors to move laterally by obtaining credentials from a compromised system. SMLE Studio is our native Jupyter notebooks environment where you can train custom ML models, experiment with built-in Streaming ML capabilities, or build sophisticated SPL pipelines right in the Splunk ecosystem. It contains all of the power of Jupyter notebooks, plus the ability to author Python and R code right next to SPL code. This is hugely valuable for developing security detections, because we can easily run experiments to improve our existing detection library with rapid iteration cycles.

_{Signature-based detection in SMLE Studio}

This detection begins by reading a dataset and then casting it into an object for sysmon. We then extract the process name out of this dataset and also match that process name with any keyword like cmd.exe or reg.exe. Next, we extract the command line arguments from the logs and try to match if it has the keyword “save” in it, and ultimately match suspicious registry keys as shown in the regex commands. Specifically we are looking for a selection of registry keys that the attacker can try to use to obtain credentials from SAM.

The next step is to test this against some sample data to determine if the detection is working. In this example, we are reading off of a dataset (from Splunk Attack Data) that we know the attack is present in. In the results, we see that an attempted credential dump was done on this dataset via registry key HKLM/SAM.

_{Signature-based detection results in SMLE Studio}

Let’s see if we can make this even better by turning this signature-based detection into a behavioral detection leveraging SMLE’s built-in Streaming ML capabilities. The new detection is very similar to the first detection, but we have tweaked it to look for any new command line arguments passed to cmd.exe by using the “first time” Streaming ML algorithm. This is unique in the industry because we are simplifying the complexities of keeping state inside an algorithm within one simple command!

_{Behavioral-based detection in SMLE Studio}

Comparing the results, we see other attacks in the dataset that the traditional signature detection missed. For example, the behavioral detection surfaced the execution of pypykatz, another tool used to obtain credentials from SAM.

_{Behavioral-based detection results in SMLE Studio}

Now that we have validated our ML-powered detection, it is ready for use. We can also go back and fine-tune the results to exclude any additional noise. With a few simple changes to our existing rules-based detection, SMLE Studio with Streaming ML enabled us to build a more complete behavioral detection that scales beyond any set of pre-determined rules.

Summary

To some, it may seem strange — or even counter-intuitive — that Splunk is so transparent with our process for developing advanced security detections. In fact, we believe in developing these defenses in an open ecosystem to share with our customers and the community. When a new vulnerability, threat, or piece of malicious software is identified, our team quickly mobilizes to release a new detection that benefits not only our customers, but the entire security community. You can find all of the detections developed by the Splunk Threat Research team (along with the attack datasets used to test these detections) on the Splunk Security Content GitHub repository.

Developing security use cases can be a real challenge. By combining the power of SPL with the capabilities of Streaming ML, SMLE unlocks a new set of opportunities for building robust security detections, and has proved to be a useful tool in our own Threat Research Team.

Next Steps

Want to keep up with the latest from the Splunk Security and Threat Research team? Check out the Splunk Security Content repo on GitHub. With 46 authors pushing 322 commits in the last month alone, it is a thriving and active community.

Stay tuned for the next post in this mini-series of blogs about our security and threat research at Splunk!

Resources

Security content repository on GitHub
Interested in SMLE? Sign up for our Customer Advisory Board and interest list
Blog: Get to Know Splunk Machine Learning Environment (SMLE)
Blog: Detecting Credit Card Fraud Using SMLE
Blog: Machine Learning Guide: Choosing the Right Workflow
SMLE Product Brief

Style

two-column

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Platform

2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Platform

3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.

Platform

5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.

Dashboard Studio: Token Eval and Conditional Panel Visibility

Platform

4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Platform

4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

Platform

3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Platform

3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.

Platform

3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Platform

2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.

/en_us/blog/fragments/about-splunk

/en_us/blog/fragments/subscribe-footer

Threat Hunting With ML: Another Reason to SMLE

Improving Security Detections with ML

Summary

Next Steps

Resources

Related Articles