SECURITY

Hunting Your DNS Dragons | Splunk

      

This blog post is part fifteen of the "Hunting with Splunk: The Basics" series. Derek King, our security brother from England, has chosen to write on a subject near and dear to my heart—DNS. I've been using Splunk and DNS data to find badness in networks since 2011 and I continually find new methods and approaches. Derek deals up some oldies but goodies, shows some awesome visualizations, and then brings some new slaying techniques to the adversary battle. Enjoy! – Ryan Kovar

comic DNS CISSP

Oh no! You’ve been hacked, and you have experts onsite to identify the terrible things done to your organization. It doesn’t take long before the beardy dude or cyber lady says, “Yeah...they used DNS to control compromised hosts and then exfiltrated your data.” As you reflect on this event, you think, “Did I even have a chance against that kind attack?”Yes, you did because Splunk can be used to detect and respond to DNS exfiltration. Since you've been an avid reader of "Hunting with Splunk: The Basics" series, you all know that good hunting starts with a hypothesis or two. So, let’s create a hypothesis!

You could hypothesize that the adversary might use DNS to move sensitive files out of your organisation or use it as a side channel for communications with malicious infrastructure. With the right visualizations and search techniques, you may be able to spot clients behaving abnormally when compared either to themselves or their peers!

Where Do We Find the Data?

If you're already sucking DNS data into Splunk, that's awesome! However, if you’re not and you haven't seen Ryan Kovar and Steve Brant's .conf2015 presentation, "Hunting the Known Unknowns (with DNS)," then read it—it's a treasure trove of information. If the work of my esteemed colleagues just isn’t your bag, then I’m sure they won’t take it personally...much.

If that's the case, let me tell you that Windows DNS debug logging, Bro DNS and Splunk’s Stream can all be excellent sources of data. If you want to follow along at home and are in need of some sample data, then consider looking at the “Splunk Security Dataset Project.” All of the searches below were tested on the BOTSv1 data.

What Should We Be Looking For?

There are many questions you can use to support your hypotheses. For example, if your hosts are compromised they may show changes in DNS behaviour like:

  • Increase in volume of requests by the client (indicating C&C or data movement)
  • Change in the type of resource records we see (e.g., TXT records from hosts that don’t typically send them)
  • Variance in the length of the request (indicating DGA or encoded/obfuscated data stream)
  • Variability in the frequency of requests (Beaconing activity to C&C)
  • Randomness in domain names (DGA)
  • Substitution of domains to very slightly altered domains (typo-squatting)

These are adversary techniques we can craft searches for in Splunk using commands like stats, timechart, table, stdev, avg, streamstats.

Let’s Go Hunting!

In the section below, I will show you some ways to detect weirdness with DNS based on the techniques highlighted above.

Top 10 Clients by Volume of Requests

Capturing spikes or changes in client volumes may show early signs of data exfiltration.

tag=dns message_type="Query" 
| timechart span=1h limit=10 usenull=f useother=f count AS Requests by src

 

We begin with a simple search that helps us detect changes over time. The first line returns the result set we are interested in, followed by the timechart command to visualise requests over time in one-hour time slices.


Clients with an unnecessary number of events compared with the rest of the organisation may help to identify data transfers using DNS.

Requests by Resource Record Over Time

Changes in resource type behaviour for a client may point toward potential C&C or exfiltration activity. Both A records and TXT records should be observed carefully as these are common techniques. However, don’t be blind-sided into just these two resource types!

tag=dns message_type="QUERY"
| timechart span=1h count BY record_type

 

Continuing to keep things steady for a start, we again begin with the same dataset and use the timechart command to visualise the record type field over time in one-hour slices. This search could be used in conjunction with the previous search by including a client IP of interest to help follow our hypothesis.

Spotting changes in behaviour early is a great way to reduce the impact of a compromised host. Using Splunk to search historical data helps to identify when a host was initially compromised and where it has been communicating with since.

Packet Size & Volume Distribution

Events that have significant packet size and high volumes may identify signs of exfiltration activity.

tag=dns message_type="QUERY"
| mvexpand query
| eval queryLength=len(query)
| stats count by queryLength, src
| sort -queryLength, count
| table src queryLength count
| head 1000

 

Whoa, we’re throwing in a couple more commands here. Let’s take a closer look—it’s fantastic, I promise.

We start with the same basic search as before, but this time we need to mvexpand (if you’re following with the BOTSv1 dataset) our multi-valued field, and then use the eval command with the len function to calculate the length of the query field. The stats command provides a count based on grouping our results by the length of the request (which we calculated with the eval statement above) and src field. Sort is applied to see the largest requests first and then output to a table, which is then filtered to show only the first 1,000 records. We can then use the scatter chart to visualise.

In the above example, looking for distributions that do not match the norm are identified using the scatter chart. A high number of requests, and/or large packets will be of interest. For example, I usually visit ‘www.bbc.co.uk’ and ‘www.facebook.com’ (thirteen, and sixteen characters respectively). If, however, the malicious software opens a sensitive document that’s 5 Mb in size, chops it into 255-byte packets, and sends via DNS requests, then I’m likely to see many 255-byte packets.

Beaconing Activity

Let’s take it up a notch now and look for clients that show signs of beaconing out to C&C infrastructure. Beaconing activity may occur when a compromised host ‘checks in’ with the command infrastructure, possibly waiting for new instructions or updates to the malicious software itself.

tag=dns message_type="QUERY"
| fields _time, query
| streamstats current=f last(_time) as last_time by query
| eval gap=last_time - _time
| stats count avg(gap) AS AverageBeaconTime var(gap) AS VarianceBeaconTime BY query
| eval AverageBeaconTime=round(AverageBeaconTime,3), VarianceBeaconTime=round(VarianceBeaconTime,3)
| sort -count
| where VarianceBeaconTime < 60 AND count > 2 AND AverageBeaconTime>1.000
| table  query VarianceBeaconTime  count AverageBeaconTime

 

In this example, we use the same principles but introduce a couple of new commands.

The fields command is a great way to speed Splunk up. Keeping only the fields you need for following commands is like pressing the turbo button for Splunk. Give it a go and you’ll be feeling like an SPL ninja in the next five minutes—honest, guv!

The streamstats and following eval command allows us to calculate the difference in seconds between the last two events Splunk gets. This use of streamstats is an elegant trick! If you want to know more, review the blog post "I Need to Do Some Hunting. Stat!" to learn the dirty details!

Once past streamstats we are sailing to beaconing heaven using our much-loved stats command to calculate some averages and variance of queries, with a bit of rounding and sorting.

The last new command we used is the where command that helps us filter out some noise.

In this example, spotting clients that show a low variance in time may indicate hosts are contacting command and control infrastructure on a predetermined time slot. Say every thirty seconds or every five minutes.

Number of Hosts Talking to Beaconing Domains

Identifying the number of hosts talking to a specific domain may help to identify potential BOT activity or help to identify the scope of hosts currently compromised.

tag=dns message_type="QUERY"
| fields _time, src, query
| streamstats current=f last(_time) as last_time by query
| eval gap=last_time - _time
| stats count dc(src) AS NumHosts avg(gap) AS AverageBeaconTime var(gap) AS VarianceBeaconTime BY query
| eval AverageBeaconTime=round(AverageBeaconTime,3), VarianceBeaconTime=round(VarianceBeaconTime,3)
| sort –count
| where VarianceBeaconTime < 60 AND AverageBeaconTime > 0

 

Nothing much new in this search. We look to see beaconing activity and the number of distinct hosts communicating with it, which may help us to scope multiple hosts being bad!

The search only introduces one new function of our stats command. Distinct count (dc) is used to show us how many hosts show the same activity.

This example is very like the previous beaconing activity (i.e., looking for timing requests that are consistent), but this time we are aggregating clients that are showing the same behaviour.

Domains with Lots of Sub-Domains

Encoded information could be transmitted via the sub-domain. Looking at the number of different sub-domains per domain may help identify command and control activity or exfiltration of data.

tag=dns message_type="QUERY"
| eval list="mozilla"
| `ut_parse_extended(query, list)`
| stats dc(ut_subdomain) AS HostsPerDomain BY ut_domain
| sort -HostsPerDomain

 

Here, we are looking to see how many subdomains are requested per domain. This behaviour may help us identify signs of exfiltration or DGA domains. The URL Toolbox allows us to parse domain names easily. Check out our "UT_parsing Domains Like House Slytherin" blog post if you want to know more.

As always, we begin with our DNS dataset of interest and create a field with a value of ‘Mozilla’. If you have read the link above, you’ll understand perfectly. If not, it’s needed for the URL Toolbox. ;-) After ‘ut_parse_extended’ we continue to use commands we have used previously. Our stats command is used to count the distinct number of sub-domains by domain, and then the results are sorted to give us the highest value first.

Dashboards

In this example, we are looking for high numbers of sub-domains per domain. It's likely we will need to do some filtering for common, assumed good sites.


Here at Splunk, we have a saying: “Get shi stuff done!” The good news is everything above is available to download right away from my GitHub repo to help you get started hunting.

NOTE: As always, we write our searches to be common information model (CIM) compliant. You may need to adjust the sourcetypes/tags/eventtypes to suit your environment!

Real-World Improvements You Can Make

Here are some additional ideas to enhance the quality of your results:

  1. Use lookups, lookups, and more lookups, to remove noise! Using the sub-domain example above, knowing that many chickenkiller.com sub-domains are being queried is much more critical than the number of microsoft.com subdomains.  Eliminate that noise by following this excellent advice from Ryan in his blog post, "Lookup Before You Go-Go...Hunting."
  2. Excluding non-working days.
  3. Filter excess noise with the where command.
  4. Consider post-process searching. It will save you a TON of time!
  5. Make dynamic dashboards a filter for specific clients.

Happy hunting!

P.S. - A Note About Randomness

We intentionally avoided talking about randomness in this blog post, but if you are interested in detecting typ0 5quatting or more randomness with math, then take a look at the blog posts "You Can’t 'Hyde' from Dr. Levenshtein When You Use URL Toolbox" or "Random Words on Entropy and DNS."

Derek King
Posted by

Derek King

I've had a long and meandering journey to Splunk, with (ahem) 20 years in technical roles from application development, OS engineering, Networking, and the last 10 years fell in love with all things cybersecurity. At Splunk I help customers out in any way I can, from understanding the basics, to doing cool cyber stuff with it!