Machine Learning in Security: NLP Based Risky SPL Detection with a Pre-trained Model

The Splunk Vulnerability Disclosure SVD-2022-0604 published the existence of an attack where the dashboards in certain Splunk Cloud Platform and Splunk Enterprise versions may let an attacker inject risky search commands into a form token.

To address this security gap, we published a hunting analytic, and two machine learning (ML)-based detections that helps find users running highly suspicious risky SPL commands. The "Learn More" section of this blog post lists all relevant detections. Our previous blog post discusses an ML-based approach using outlier detection to detect malicious usage of Risky SPL. The core idea behind that approach was to detect outliers at the runtime of SPL. We used unusually long search times (as compared to a benign search) as a proxy for data collection for exfiltration using risky SPL. This allows for the behavioral modeling of entities exploiting risky commands by training on users’ historical search patterns. In this blog post, we discuss another ML approach that uses Natural Language Processing (NLP) techniques to detect Risky SPL using pre-trained models.

The Challenge

Certain commands are deemed risky as using them incorrectly may lead to a security breach or data loss. These commands could also be used by malicious entities for data exfiltration and sabotage. Hence, the Splunk platform contains Search Processing Language (SPL) safeguards to warn you when you unknowingly run a search that has commands that might be either a security or a performance risk. To protect users, the Splunk Search app sends a warning via a dialog box to not click a link or type a URL that loads a search that contains risky commands. Although this safeguard provides the users with a useful tool to make informed decisions by carefully evaluating their propensity for risk, not all scenarios are covered. There are cases where the safeguards can be circumvented by malicious entities thus leaving the users exposed to attacks as outlined in SVD-2022-0604. Further details about the SPL safeguards for risky commands can be found here.

Approaching Risky SPL Detection as a Machine Learning Problem

Detecting Risky SPL can be approached as a pure text classification problem as it is contingent on the specific commands used. We developed a ML-based approach to detect potentially Risky SPL that may have been executed. The core of the idea is to use NLP techniques to train a classifier that can distinguish between Risky vs. Non-Risky SPL based on command text. Casting the problem as an NLP problem allows us to use the rich toolset developed for such tasks.

The Case for Using Machine Learning

While hunting detections are of prime importance in establishing a baseline and key features that define a problem, they are hard to tune to the fine nuances of edge cases, and more so if the adversary is adaptive. This may result in a high false positive rate which can be addressed by a ML-based approach as it can adapt to the changing attack landscape based on training data. A ML model can also be repurposed to solve similar problems by simply tuning the features and hyperparameters based on training data, this also amortizes the overall development effort. These benefits bring direct value to a Splunk customer by increasing the detection count while reducing the number of searches. A ML-based approach presents a more granular view, metrics like risk score can be used to streamline triaging of suspicious events. When paired together with hunting detections, ML-based detections create a very robust metric that is more resistant to adversarial manipulation and narrows the attack surface available to an adversary. All these reasons present a strong case to harness the power of ML.

Modeling Risky SPL Detection

A command is deemed risky based on the presence of certain trigger keywords, along with the context and the role of the user. We develop a model that uses custom NLP features to predict whether an SPL command is risky. The model takes as input the command text, user, and search type and outputs a risk score between [0,1]. A high score indicates a higher likelihood of command being used for malicious purposes.

We leverage the power of Splunk Machine Learning Toolkit (MLTK) to train the model independently and subsequently distribute it through Splunk Enterprise Security Content Update (ESCU). We adopt the approach of pre-training the model to simplify many tasks such as feature engineering. Such tasks are significantly more complex and not always achievable if done using pure SPL. We break down the model development process and discuss the advantages of pre-training the model in the sections below.

Data Collection and Cleaning

We use the highly sophisticated and flexible Splunk Attack Range to generate the data for training our ML models. The Splunk Attack Range is a detection development platform which allows us to quickly set up attack environments and simulate attacks. This facilitates collection of high quality attack data. We further enhance the quality of data by carefully sifting through it and fixing labels. This allows us to train the model with high quality data and minimizes prediction errors.

Feature Engineering

Having data at-hand gives us the opportunity to use a large array of text processing tools which result in rich and descriptive features. We begin by looking at the definition of risky commands. The following search commands are considered risky due to the potential security and unintended data loss risk posed by their incorrect usage: collect, dump, delete, fit, outputcsv, outputlookup, run, runshellscript, script, sendalert, sendemail, tscollect.

Thus tokenizing the SPL text and deriving a one-hot feature vector based on the presence of these keywords should serve as a powerful discriminator between Risky and Non-Risky commands. We further enhance the features by considering the keyword frequency of occurrence. This results in a robust feature vector which can discriminate between Risky and Non-Risky commands with high accuracy. We test our intuition by plotting the data distribution using the computed features to test if they are representative. As seen in the t-SNE plot below, we observe that the chosen features accurately represent the data and additionally the data is well separated thus helping us in creating models with high accuracy.


We found that token distribution is a highly accurate metric for capturing the risk perception of an SPL query. The token based features can be further enriched by adding n-gram based text features, which leaves room for experimentation and adaptation in the future.

Model Selection and Parameter Tuning

After cleaning the data and extracting the features, we tried out various training methodologies and tuned them for optimal performance. Since the data is well separated, most classifiers give good accuracy. We chose Logistic Regression for our task as it is fast and provides a likelihood score of a sample belonging to a particular class based on the proximity to the decision boundary. This score is extremely useful in tuning the final SPL detection to the level of risk permissible by a particular customer.

We use the MLTK app which provides a rich suite of ML tools for all our ML needs. Additionally, pre-training the model while still using MLTK helps us deploy the model as a macro which can be paired with SPL to create detections. This gives us the flexibility to use feature engineering and training on custom data while still being conveniently deployable.

Results and Analysis

In this section, we discuss some key model quality metrics. We start off by looking at the Receiver Operating Characteristic (ROC) curve, which indicates very good classification accuracy with a very low false positive rate for a high true positive rate. The Area Under the Curve (AUC) is 0.9948 (1 for perfect classification) which indicates very good classification accuracy.

The results are confirmed by the confusion matrix. The lower left quadrant is of prime importance for any security-based classification problems. Almost all risky detections are caught with a false negative rate of only 0.56%. The false positive rate is also very low at only 1.83% which is an important metric from the usability perspective. In security critical scenarios, incidents of interest are rare and even a modest false positive rate can render the system useless by fatiguing the user with a high number of false alarms, so we need to be careful while designing such systems so that we keep the false positive rate low.

We discuss in the following sections how the classifier prediction score is coupled with other signals to create significantly more robust Risky SPL detection. The distributed approach is far less vulnerable to the inaccuracies of a single system and provides a very robust prediction.

Deploying the Pre-trained Model in MLTK

After the training is complete, we can export the model and package it so that it is available as a macro to be used in SPL. This macro was used to create a SPL detection.

Constructing the Detection

Once the model is trained and made available as a macro via ESCU, we wrote a detection to find the potentially Risky SPL. The detection is based on the search activities in the Splunk app audit data model. The related data fields used in this detection are search (the search string), search_type (the type of the search) and user (the name of the user who ran the search). The detection and the result can be found below. It is worth noting that we pair the classification score of the model with the search type and user role as these parameters are key in deciding the context in which a particular command was executed. We also give the choice to the customer to tune the risk score. Such tasks lend themselves very naturally to ML based detections, hence we see that incorporating ML leads to a detection with nuanced and graduated prediction.

​​| tstats `security_content_summariesonly` count min(_time) as firstTime max(_time) as lastTime from datamodel=Splunk_Audit.Search_Activity where Search_Activity.search_type=adhoc Search_Activity.user!=splunk-system-user by Search_Activity.user Search_Activity.search_type
| eval spl_text = ''. " " .'Search_Activity.user'. " " .'Search_Activity.search_type'
| dedup spl_text
| apply risky_spl_pre_trained_model
| where risk_score > 0.5
| `drop_dm_object_name(Search_Activity)`
| table search, user, search_type, risk_score

Further details can be found in the detection documentation.

Concluding Thoughts

In this blog, we demonstrated Splunk can deploy powerful and adaptive ML-based detections. MLTK is a very powerful tool when employed in the correct manner. We also showed how certain tasks call for a ML-based approach as the reality might be context dependent. The approach is also extensible to similar problems and can be adapted.

Learn More

If you would like to adopt this detection, you can get the corresponding baseline and detection YAML files from the Splunk Security Content GitHub repository.



Technique ID




Detect Risky SPL using Pretrained ML Model



This YML is to use a pre-trained machine learning text classifier to detect potentially risky commands.

More Related Detections


Splunk Command and Scripting Interpreter Risky SPL MLTK Baseline



This YML is to build baseline models for risky command exploit detection from user’s past 7 days’ search activities using total search run time as user behavior indicator.


Splunk Command and Scripting Interpreter Risky SPL MLTK

This YML is to utilize the baseline models and infer whether the search in the last hour is possibly an exploit of risky commands.


Splunk Command and Scripting Interpreter Risky Commands

This YML file is to hunt for ad-hoc searches containing risky commands from non-administrative users. 


Splunk Comma and Scripting Interpreter Delete Usage

This YML is to identify the use of the risky command ‘DELETE’ that may be utilized in Splunk to delete some or all data being queried.


Any feedback or requests? Feel free to put in an issue on GitHub and we’ll follow up. Alternatively, join us on the Slack channel #security-research. Follow these instructions if you need an invitation to our Splunk user groups on Slack.


We would like to thank the following for their contribution to this post and corresponding detections:

  • Abhinav Mishra
  • Bhavin Patel
  • Glory Avina
  • Jose Hernandez
  • Karim Mahrous
  • Kumar Sharad
  • Michael Haag
  • Namratha Sreekanta
  • Rod Soto
  • Xiao Lin


The Splunk Threat Research Team is an active part of a customer’s overall defense strategy by enhancing Splunk security offerings with verified research and security content such as use cases, detection searches, and playbooks. We help security teams around the globe strengthen operations by providing tactical guidance and insights to detect, investigate and respond against the latest threats. The Splunk Threat Research Team focuses on understanding how threats, actors, and vulnerabilities work, and the team replicates attacks which are stored as datasets in the Attack Data repository

Our goal is to provide security teams with research they can leverage in their day to day operations and to become the industry standard for SIEM detections. We are a team of industry-recognized experts who are encouraged to improve the security industry by sharing our work with the community via conference talks, open-sourcing projects, and writing white papers or blogs. You will also find us presenting our research at conferences such as Defcon, Blackhat, RSA, and many more.

Read more Splunk Security Content

Show All Tags
Show Less Tags