Splunk for Security Investigation: Command and Control Analysis

Network data, such as firewall, web proxy, or NetFlow contains detailed records of all activities between users and hosts. Learn more in this video.


Video Transcript

Network data such as firewall, web proxy, or NetFlow contain detailed records of all activities between users and hosts, since the network is the medium for all device communications. For example, web proxy contains all records of all web communications between an internal host and external web servers. Analyzing web proxy data traffic events can help determine malicious activities in the network-- for example, whether command and control activities are happening in the network. Analysts can search to find out important information, such as what internal and external entities are involved in malicious activities, what types of activities are associated during the time window of command and control connections, either from direct web proxy logs or other subsystem activities. It's difficult to detect and validate malicious activity and to react quickly, assuming the attacker has already established command and control.

When you look at all the data, the malicious activity won't jump out at you. It requires complex calculations to determine the actions that are anomalous and conclude what is malicious from all the noise.

By applying calculations and statistics on massive amounts of data, enables you to detect unusual patterns. Through this online experience exercise, we will show you how you can quickly detect command and control beginning to malicious web domains via proxy. We will be looking for an internal client host communicating to an external web domain or host in a persistent pattern. Persistence is to compare the real user going to a legitimate web visit versus an infected host communicated with command and control host.

Now that you know what we're looking for, let's calculate the average gap and its variance of each host to the web domain session intervals. We'll be looking for the repetitive pattern of communications over time. We'll also calculate the total number of sessions for each host to web domain pairs. We'll be looking for shorter session interval averages per host and destination.

First, we'll select the Blue Coat proxy event as the network traffic data we'll initially search to analyze command and control traffic in the network. Next, for each destination website, we'll calculate the client session intervals between each connection in order to calculate persistency, looking for beginning activities. Using the streamstats command, we'll calculate the time of the next event to the destination into the current events field. The streamstats command allows us to calculate any statistics of a certain time window. Then it makes it available as another field, a calculated one, in a field panel dynamically. So statistical conditions or values associating in a time window can be applied to analytical condition.

Next, with the calculated timestamp between the current and the next event, we can now calculate the time gap between the connection that each host goes out to an external website or domain. We will use the eval command to calculate the gap between one host connecting to a web domain. The results of the eval command add additional calculated fields called gap to the field sidebar. The field gap now has the connection interval in milliseconds.

This is an example of how you can apply statistics and formulas to define complex security criteria typically needed in effective security correlation rules. Now, with the calculated time gap interval, we can calculate the average connection interval between an internal host and an external web domain or server. By using the stats command, we've applied statistics to aggregate the total number of connections, the average of the time intervals between connections, and the variation of time intervals between those connections.

With the stats command, you can see your search results in the table format of the statistics view. The results of the stats command shows the total count of the internal host that's connecting to different outside web domains, following with the statistics on the average connection interval as average gap that indicate how often the internal host is connecting, and variation of the interval that indicates how consistent or regular the connections are. So speaking of activities, command and control sessions would have a high number of total connections with shorter time gap between one connection to another, and a lower variance of connection intervals that would represent a repeating connection.

We will use the search command to filter the host and destination pair with the shorter time interval averages, where average gap is smaller than 50 milliseconds and the total number of connections exceeds a certain number, in this example, 500 connections. Finally, sort the traffic stats from the lower average interval to the higher interval. Using the sort command, we want to have the shorter-average-session hosts on the top. Without leaving the Splunk searching interface, we were able to apply mass calculations and filter logics so we could detect troubles in the network before they became our nightmare.

Here are the top three internal hosts with gap time averages between sessions that are approximately 14 to 15 seconds and a lower variation value of approximately 200, which indicates a persistent pattern. Interestingly, all three hosts are talking to an unknown domain overseas, which we can further verify with threat intel.

Through this exercise, we have defined traffic anomaly criteria based on calculating important session traffic characteristics. Stay tuned, and please check out our other use case examples in the Splunk for Security Investigation online experience series.