Hunting for Detections in Attack Data with Machine Learning
A recent exercise using machine learning (ML) to hunt threats in Windows audit logs containing traces of post exploit kits illustrates that even small amounts of attack data can create new analytic opportunities. I wanted to explore which dual-use utilities (i.e. living off the land tools) were leveraged by exploit kits Meterpreter and Koadic to perform discovery, lateral movement, and other actions. Rod Soto, a prolific threat engineer and Splunker, curated a dataset containing Windows logs that captured artifacts (e.g. process creations, logons) of these tools.
It is tempting as a Data Scientist to create a model with minimal guidance and see what anomalies appear since some types of models (like Deep Neural Networks) provide a degree of feature engineering that may not need expert intuition in honing the model. This, however, is a recipe for fruitless investigations, especially for end-users. There is no shortage of unusual sequences of events in the normal course of machine operation. Therefore, we need to provide some guidance for the model, but not be overly prescriptive and miss detection opportunities.
I leveraged Rod’s guidance on what logs and actions to analyze (logons and process creation) to focus the model on a subset of data with the highest return on investment. I would call this a known unknown search: we have an idea of when the attack occurred and which logs likely contain artifacts of the attack, but not necessarily how the attack will appear (if at all). I trained a deep learning anomaly detector to find unusual collections of process creations from the Windows system folder (e.g. C:\Windows\). The specific model I used is called an autoencoder, which learns to compress the input data, process creation counts in this case, into a low dimensional space and then recreate the original input so that the reconstruction is as close to the original as possible. Unusual collections of process counts in these periods, such as excessive icacls.exe invocations, may be best explained by the operation of an attack and not normal OS activity. My primary tools for this task are TensorFlow, Jupyter, and Python. An example notebook is available on our security content repository. Below is a screenshot of a notebook that contained my investigation.
Snapshot of a Jupyter notebook. The above cell finds the most anomalous windows of activities. In this hour, we observe many processes launched with executables from C:\Windows. Note that we see many executables that are leveraged by attackers (msiexec, net, icacls, rundll, etc.).
The model quickly identified two very unusual behaviors of these tools concerning process creation. First, we found that the tools created and executed an excessive number of processes from Windows Temp. Second, since Meterpreter and Koadic reside in memory, many of their actions require launching taskhost. We confirmed that the number of either processes created from Windows Temp or the number of taskhost and taskhostex invocations by the exploit kits was significantly more than what was observed in our research of normal Windows activity. Detections based on this investigation are now part of security content:
Examples of processes launched from C:\Windows\Temp. In this twenty-minute block, we see 55 distinct process paths. Unusual for sure.
I took away some key learnings from this exercise. Unsupervised machine learning tasks like anomaly detection for security can be both powerful and efficient if the dataset is focused on the attack window and you have a general, but not exact, idea of what to look for. Data scientists constantly struggle with noise and trying these techniques with little supervision over large numbers of machines without a known attack will surely result in a lot of false positives because anomalies happen. It may be more effective to take attack data as a starting point to then generalize to find novel threats. Therefore, we are not sending SOC analysts on wild goose chases, but rather focus on investigating real threats. We will continue to leverage our attack datasets for this ML-based hunting and periodically post interesting findings on our blogs. See you then!
----------------------------------------------------
Thanks!
Michael Hart
Related Articles

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends

When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR

Splunk Security Content for Threat Detection & Response: November Recap

Security Staff Picks To Read This Month, Handpicked by Splunk Experts

Behind the Walls: Techniques and Tactics in Castle RAT Client Malware

AI for Humans: A Beginner’s Field Guide

Splunk Security Content for Threat Detection & Response: November 2025 Update

Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It
