SECURITY

This is NOT the Data You Are Looking For (OR is it) | Splunk

      

This is part seven of the "Hunting with Splunk: The Basics" series.

So far in this series, we’ve shared some key techniques that are required for threat hunting using Splunk—we’ve discussed how to enrich data with lookup commands and workflow actions, how to examine network traffic with Splunk Stream, how to discover all the different types of data available in your Splunk instance, how to make the most of your Windows event logs, and how to start using the Splunk stats command.

Waynes World memeThis post will continue by introducing a set of foundational Splunk threat-hunting techniques that will help you filter data.

Why is filtering data important? Well, Splunk allows you to store gigabytes, terabytes, or even petabytes of full-fidelity security data, yet the evidence you are seeking during a hunt or investigation is often contained in just a few events. You need to eliminate the noise and expose the signal.

To do this, we will focus on three specific techniques for filtering data that you can start using right away.

1. It’s About Time

The most obvious (but often overlooked) technique for reducing the number of events returned by your Splunk search—and getting you closer to actionable results—is to specify an appropriate time range. If you can put a left and right boundary on the timeline of your hunt, you enable Splunk to ignore events from time periods that have nothing to do with your hypothesis, potentially saving you valuable time and system resources along the way. For most Splunk users, the easiest way to specify the time range is to use the time range picker as shown in Figure 1 (below).

In this example, I’m looking at some DNS events from our Boss of the SOC v1.0 data set. Specifically, I’ve asked Splunk to search all DNS activity on August 24, 2016. This search returned in about 6.6 seconds and returned about 55,000 results. The same search run over the entire month of August 2016 (not shown) returned about 1.37 million events and took approximately 184 seconds to complete. In this case, selecting an appropriate time range helped us realize a 96% reduction in both the number of events and time to run the search!

Your data and hunting hypotheses will vary, but remember—when hunting in Splunk, it pays to pay attention to time.

 


Figure 1: Filtering events based on timestamp

2. Fields of Dreams

Splunk is often referred to as a search engine for your data, and it’s easy to see why when you enter a simple phrase into the search app. Events containing this phrase begin to appear, usually within just a few seconds. We sometimes refer to searching in this way as "super-grepping", and—while it can be effective—Splunk has lot more power under the hood.

One excellent way to up-level your Splunk search skills (and to become a more effective threat hunter in the process) is to begin harnessing the power of field-value expressions to narrow your search.

As Splunk is returning results, it’s also extracting fields from each event. You can take advantage of these fields using the Splunk Search Processing Language (SPL). Events generated by different systems in your environment will have different fields; however, all events in Splunk have a few common fields, including host, source, and sourcetype. These fields are special in that they are extracted and stored immediately when the events are indexed which—in turn—makes searches that use these fields very fast.

One of the best ways to begin filtering events in Splunk is to search for a specific sourcetype by simply including a field-value pair like the following in your Splunk search as early (meaning as far to the left) as possible. This example shows a simple search that filters results to include only Microsoft Sysmon events.

sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"

Next, let’s look at how you can combine multiple fields to narrow your search even further. Note that when you search multiple fields, Splunk combines the search terms together using a logical "AND" operator. Figure 2 (below) includes an example of a multi-field search that returns all the Microsoft Sysmon events that came from the system named "we4781srv".

sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" host="we4781srv"

Note that these sample searches are subject to the time window you chose for the search (see technique 1 above), and they will only retrieve results from the Splunk indexes to which you’ve been granted access. In production environments, it’s a good practice—and sometimes required—to add "index=<myindexname>" to the beginning of your search.

Now, let’s have a look at the additional fields available in these Microsoft Sysmon events.

 

Splunk Microsoft Sysmon event fields
Figure 2: Examining fields in a Microsoft Sysmon event

A few helpful tips as you start to tailor your Splunk threat hunting searches using fields:

  1. Field names are case sensitive, e.g. "EventCode" and "eventcode" are entirely different fields and each could have a different value.
  2. Field values are case insensitive so "system" and "System" are equivalent. If you need to match on case sensitive field values, look into the Splunk where SPL command. 
  3. Wildcards in field-value pairs are often very useful, e.g. sourcetype="stream:*", src_ip="192.168.250.*", or sourcetype="*sysmon*".
  4. The Boolean operators, "AND", "OR", and "NOT" and parentheses for grouping are supported. Be sure to capitalize the Boolean operators, or you might end up super-grepping for the word "and" which is almost certainly not what you intended.
  5. Other comparison operators such as "<", ">", "<=", "<=", and "!=" are also supported.

More details on the Splunk search command can be found in Splunk Docs.

3. This is NOT the Data You Are Looking For

Finally, let's look at a quick and effective filtering technique we have available when threat hunting with Splunk—namely the "NOT" Boolean operator. As we've seen throughout this post, the primary goal while hunting in Splunk is to remove events from the result set that don't help to prove or disprove our hypotheses, and the "NOT" operator is a great tool for this purpose.

Let's start by examining some DNS queries captured by Splunk Stream. In this case we might be investigating a system that has been behaving suspiciously, or we may be looking for "threads to pull" to help us formulate a hunting hypothesis. Here we will use Splunk to summarize the requests, then eliminate data we can explain and dig deeper on data that we can't. The search in Figure 3 (below) yields 234 unique DNS queries, and nothing obviously suspicious or malicious.

 


Figure 3: A simple search to review DNS activity

sourcetype=stream:dns src=192.168.250.100 query_type{}=A
| stats count by query
| sort -count

Now let's start filtering using "NOT." First up is to get rid of some DNS lookups that are used for browser configuration and IPv6 tunneling. To accomplish this, we add a couple of "NOT" field-value pairs. Note the use of wildcards to catch instances from different domains.

 


Figure 4: Starting to filter with NOT

sourcetype=stream:dns src=192.168.250.100 query_type{}=A
 NOT query=wpad*
 NOT query=isatap*
| stats count by query
| sort -count

Here the benefits of our filtering begin to emerge. In this case, we reduced our result set size from 234 to 176. Now let's continue by filtering results from authorized vendors/products (note this varies in every environment), local domains, content delivery networks, etc.

 


Figure 5: More filtering

sourcetype=stream:dns src=192.168.250.100 query_type{}=A
   NOT query=wpad*
   NOT query=isatap*
   NOT query=*.windows.com
   NOT query=*live.com
   NOT query=*nsatc.net
   NOT query=*windowsupdate.com
   NOT query=*msedge.net
   NOT query=*trafficmanager.net
   NOT query=*office.com
   NOT query=*bing.com
   NOT query=*virtualearth.net
   NOT query=*msn.com
   NOT query=*.microsoft.com
   NOT query=*msftncsi.com
   NOT query=*microsoftonline.com
   NOT query=demo-01
   NOT query=*waynecorpinc.local
   NOT query=*public-trust.com
   NOT query=*ocsp*.com
   NOT query=*akamaiedge.net
   NOT query=*akadns.net
   NOT query=*akamaized.net
   NOT query=sway-cdn.com
   NOT query=*symc*.com
| stats count by query

We've now significantly reduced the result set size all the way down to 15! At this point, it becomes much easier to identify unauthorized software (Acronis in this case), social media usage (Twitter), and even a couple of domains that are just downright suspicious looking.

 


Figure 6: Visually inspecting the filtered data set

As searches like this grow in size, it often makes sense to consolidate them into a Splunk lookup table as described in detail in "Lookup Before You Go-Go...Hunting" from earlier in this series. 

Wrapping Up

Over the last year, we've had the fantastic opportunity to observe over 1,700 participants challenge themselves by competing in Splunk Boss of the SOC activities around the world. Time and again, we see that competitors who are at the top of the BOTS leaderboard are those who can quickly distill large amounts of raw search results down to a few key events. Mastering the three simple filtering techniques discussed in this blog post will enable you to become a much more efficient analyst and threat hunter, and might even improve your standing at the next Splunk BOTS event!

Happy hunting!

Dave Herrald
Posted by

Dave Herrald

Dave is a technical security professional with 20+ years of experience. Dave currently works as Principal Security Strategist for Splunk where he focuses on the Boss of the SOC (BOTS), engaging the security community, training technical teams around the globe, and working directly with Splunk customers. Dave has worked in various roles including strategic security consultant, penetration tester, security engineer, and chief information security officer. Dave holds many security certifications including GIAC Security Expert (GSE) #79.

TAGS

This is NOT the Data You Are Looking For (OR is it) | Splunk

Show All Tags
Show Less Tags

Join the Discussion