This is part five of the "Hunting with Splunk: The Basics" series.
Many people have a love/hate relationship with Windows. My first job out of college was at a defense contractor as a system administrator. Since I was the new guy and had not yet grown my “Unix” beard, I was given the responsibility of maintaining a small Windows NT 4.0 domain. In that year, I learned a lot about the platform and gained a lot of respect for what Microsoft products could do.
As anyone who has Splunked a Windows machine knows, they are a bit…chatty. The good news is that not only can the universal forwarder bring in event log, but by using Splunk Technology Add-ons, it can also collect sysmon data, registry information and performance monitors. This flexibility provides an analyst looking to hunt with an array of options.
This blog post will highlight some of the most valuable places to start hunting in your Windows logs. While not an exhaustive list, these tips will help your hypotheses building and provide a good starting point for hunting on your endpoints.
The first Windows Event Code I want to tell you about is Event Code 4688. It may very well be the most important event code that exists*. Windows defines Event Code 4688 as “A new process has been created," but it’s so much more—any process (or program) that is started by a user (or even spawned from another process) is logged with this event ID. For instance, if a Windows PC is infected with malware or a virus, searching code 4688 will show any processes that were created by that malware. From a hunting perspective, I could hypothesize that rare processes may contain malicious activity and as such, I want to focus my hunt on them. To do that, I can search Windows data in Splunk with something like:
sourcetype="wineventlog:security"EventCode=4688 | stats count, values(Creator_Process_Name) as Creator_Process_Name by New_Process_Name | table New_Process_Name, count, Creator_Process_Name | sort count
The search above returns newly created processes as well as their Parent Process ID (if created by a parent process). Why is this information important? Child processes will always have the same Parent Process ID as the original process. This helps find malicious processes that were created and provides the information you need to clean up the infection. If you take this search a step further, you could focus on processes that are starting up in unusual locations (not C:\windows\system32 or C:\Program Files) or isolating on specific hosts. By identifying rare processes on your machine, you will have insight that you might not have otherwise.
Now onto 4738; it's one of my personal favorites—“A user account was changed.” This event is logged whenever a user account is altered, which is especially important when an account is granted Administrator privileges in a domain or on a standalone Windows machine. I love hunting for this event and looking at anything that occurs within 2 minutes on either side of it.
Take a look at this example:
index=main [search index=main sourcetype=WinEventLog:Security EventCode=4738 | eval earliest=_time-120 | eval latest=_time+120 | fields host,earlist, latest] | table host, sourcetype, EventCode, Message
When adversaries (hackers or your own employees) are malicious, they often attempt to “elevate” permissions on a user account.
Event Code 4624 is created when an account successfully logs into a Windows environment. This information can be used to create a user baseline of login times and location. This allows Splunk users to determine outliers of normal login, which may lead to malicious intrusion or a compromised account. Event Code 4624 also records the different types of logons—for instance, network or local. Using this information, you can find outliers within your network filtering by time or even logon type.
Try a search like:
index=main sourcetype="wineventlog:security" EventCode=4624 | eventstats avg("_time") as avg stdev("_time") as stdev | eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2)) | eval isOutlier=if('_time' < lowerBound OR '_time' > upperBound, 1, 0) | table _time, body, isOutlier
It should produce a list of events and tell you whether they are statistical outliers or not (see below).
In my 20 years of being in IT and security, I can only remember one time that I cleared the event logs on a Windows machine to troubleshoot a service. Event Code 1102 occurs when an administrator or administrative account clears the audit log on Windows. It’s not something that should be used often, but when it is, it’s might be to cover something up. I’d recommend having this as “Critical” event in your SIEM, but it's also worth hunting for. Luckily, since you’re Splunking your important Windows servers, this “event clearing” will have no effect since all your logs are in Splunk.
These are by no means all of the event codes to monitor. If I listed every event that should be monitored, this blog post would no longer be a blog! If you’d like to learn more about important event codes, I recommend checking the .conf2015 session by Michael Gough that discusses the “Sexy Six” Event Codes to monitor and alert on.
As previously mentioned, this is by no means an exhaustive list of all the data that Windows can create in Splunk (check out Ryan's post on the NSA "Spotting the Adversary" white paper if you would like to get some more interesting Event Codes). I always tell my customers that “Windows creates a ton of data” inside of Splunk and knowing where to start and what to look for is important. I hope these tips will provide ideas for hypothesizing as you conduct your hunting.
Keep a lookout for both a sysmon and Windows Registry-related hunting blog post in the near future!
*To get full use of Event Code 4688 you do have to enable some extra auditing options. Ryan Kovar goes into more detail on how to get this perfect Event Code in "101 things the mainstream media doesn’t want you to know about PowerShell logging."