Splunk for Security Investigation: Ransomware

Welcome to the Splunk for Security Investigation Experience. In this fourth video, learn to harness analytics to easily detect hidden anomalies in the endpoint.

 


Video Transcript

Ransomware is an advanced form of malware that is difficult to detect. The damage of ransomware can quickly move beyond just ransom to the threat of system lockdowns that can impact customer experience, compromise customer or employee account information, or bring down critical services. However, by analyzing endpoint data, it is possible to find infections before they cause damage or impact business operations. As an attack unfolds, infections can create a complex chain of activities, making it difficult to fully scope the potential impact.

By applying statistical approaches to our data, we can calculate the baseline of all activities and isolate the outliers caused by ransomware. In this exercise, we will show you how you can harness the power of analytics to easily detect hidden anomalies in the endpoint so that you can prevent and mitigate advanced malware, such as ransomware.

But what should we look for from these types of events? Since we don't know the pattern of malicious activity, we can't explicitly define the event patterns we went to search for that will indicate the compromise. The solution is to analyze the anomalies and the granular activities that are occurring on the endpoint.

For this ransomware endpoint analysis, we will use SysMon events that contain detailed endpoint activities. Specifically, we want to find any additional command or wscript that's triggered by a user opening up an attachment from their email. As we said, advanced malware tends to use long instructions in the infection stage to initiate a dropper that communicates with a malicious server. We will look for command and wscript activities with command line length greater than four times the standard deviation. These activities are definitely outliers that fall under the Malware Likely category.

Step one-- to analyze end endpoint activities, we will pull in Microsoft's SysMon data. Splunk provides an easy way to collect MS Sys Internal data from Windows endpoints in real-time at scale. From the SysMon data, we will select events with event code equal 1 that represent process starts. We want to analyze all the process start events and those events with long command line arguments.

This requires calculations of every processes' command line argument length, averages of argument length per host, and the standard deviation of length per host, which we will use to compare the current processes' argument length with the threshold we calculate based on average and standard deviation. Any process with an argument length greater than the threshold we set will be the outliers, and thus the suspicious processes that we will further investigate.

Step two-- to calculate the length of variables, we can use the eval function for the field command line. Eval functions will create new fields based on either a calculation or a transformation. This allows you to define formulas without having to develop code outside of Splunk to address the need for special data transformation. This eval command created a new field called command line-- C-M-D-L-E-N-- which calculated the length of the field command line for each process event. And we can treat it as we would any other field, applying conditions to filter or applying additional calculations or aggregation functions. You can check docs.splunk.com for a comprehensive list of programmatic functions you can use.

Step three-- to find our baseline for comparison, we can use event stats using the average command line to calculate the average command line length per host, and standard deviation to command line to calculate the standard deviation of the command line length. We use these calculated numbers later to then identify the outliers.

The result of event stats puts additional calculated fields in the Fields panel. By using byhost in the event stats syntax, we are calculating each average and standard deviation based on each host. In other words, each host will have its own calculated averages and standard deviation baseline.

This is important, because depending on what type of host it is, there should be different thresholds. As an example, processes that run on a user workstation would have a very different profile versus a server that functions and services different programs. So the average command line length will vary based on the profile.

This application of Splunk statistical functions create dynamic calculated fields that will be used to apply logic to our analysis. This on-the-fly calculation feature eliminates the need for ETL and the need for additional programming to bring data in and out of the platform, and it allows agility in tuning and changing our analysis model.

Step four-- now with these calculated numbers, we can apply the stats command to aggregate the summary of necessary fields, summarizing them by host and command line. The stats command will convert the view into a statistics view, displaying our results in tabular format, showing the analysis of our various command process argument lengths.

The result of the stats command shows each host and command line with its command line length, the host's average command line length, and the standard deviation of the host's command line length. This stats command allows dynamic manipulation of results that is critical for agile security analysis.

Next, we will define a threshold where we are looking for any activities with command line length that is greater than the sum of the average command line length per host and four times the standard deviation of the command line length per host. We will use the eval command to define a calculated field which we'll call Threshold. This adds the result Threshold in an additional column to our Analysis table.

Step six-- finally, we will be comparing current events command line length-- max length-- with the host's command line length threshold value that we have calculated in the previous step. Using the where command, we can apply logic that says to filter any events where the current processes' command line length is larger than the threshold.

The result shows that WE8105 Desk host has executed a command.exe with a command line argument of 4,490 characters long. The average of command length on this host seems to be 109, and standard deviation seems to be 325. This command line argument length exceeds four times the sum of the average and standard deviation.

This process would be an extreme outlier, in contrast to the kinds of processes that this WE8105 Desk host has executed normally. We should investigate. By reviewing the detected process events, this isn't a normal system or internal user activity typically found running on our internal endpoint. This is the start of a ransomware infection.

Through this exercise, we have shown you how to detect ransomware by applying key analytics techniques. Using analytics, we were able to find the unknown threats by calculating a baseline for individual hosts and their typical behavior. Please check out the other use cases in the Splunk For Security Investigation Online Experience series to learn more.