This blog post is part twenty-five of the "Hunting with Splunk: The Basics" series. Something we have always discussed in BOTS and on our hunting blogs is how important file executions are. Thankfully, Shannon Davis is gonna dig into the plumbing of hunting processes! – Ryan Kovar
Quite often you are in the middle of a security incident or just combing through your data looking for signs of malicious activity, and you will want to trace the activity or relationships of a particular process. This can be a very time-consuming and frustrating task if you try to brute force things (copying/pasting parent and child process IDs over and over again). And in the heat of battle, you may miss one item that could have led you to something interesting. Splunk does a great job of ingesting process data, allowing you to search and correlate, but it can be challenging to visualize parent/child relationships for this data, especially spanning multiple generations.
To hunt for these relationships, we must first begin by ingesting the right data. I will stick to Windows for this particular blog but will endeavor to revisit it for other operating systems in the future.
The two Splunk add-ons I’m using, on top of the Windows Universal Forwarder to capture this data are:
Capturing Process Events
Once I’ve got the appropriate add-ons installed, I need to configure the Windows endpoints to capture the process-related events. There are two very good types of data for capturing new process creation events, these are:
- Sysmon with Event Code 1 enabled (SwiftOnSecurity or Olaf Hartong’s Sysmon configs are both good places to start)
- Windows Security Event Logs with Event ID 4688 and include command line in process creation events
Both of these data sources work, but I’m going to concentrate on Sysmon EventCode=1 data for this blog, as it lends itself quite well to a utility I show you further down.
Now that we’ve gotten the prerequisites out of the way, let’s start hunting. In this example, let’s assume we have a known malicious spreadsheet in our environment, and we want to understand if it’s been opened. If it has, we’d also like to know what has transpired.
To start, we can look to see if the spreadsheet filename, salaries.xls, has been observed in any EventCode 1 events in Sysmon. And as much as Brick Tamland Loves Lamp, I Love Table even more. I’m going to use the SPL command table, to “table” the time these events occurred, on which host, the user they are associated with, their associated process ID, and the full command line that triggered the process.
This search uses the main index, uses a source (not sourcetype!) for Symon data, and EventCode=1 to return process creation events. You can either define your CommandLine field search in the first search block, I’ve just shown a separate search command for clarity and to get you ready for using it in this fashion later on. I then add the fields I mentioned above to include in my table. If you would like, feel free to include as many others as you see fit.
index=main source="xmlwineventlog:microsoft-windows-sysmon/operational" EventCode=1 | search CommandLine="*Salaries.xls*" | table _time host user ProcessId CommandLine
A single process creation event referencing Salaries.xls was returned from our search. We can see that wallylambic was the user, the process ID is 6416, and the command line shows Excel.exe opening the spreadsheet from what looks to be Outlook, but we need to explore further to confirm this. Also, we are not certain if anything happened after the spreadsheet was opened.
Using the previous search, we can use the table command again to display more fields such as the parent process name, the parent process ID, parent process path, process path, and more. These fields could help us explore a bit further and confirm that processes are running from expected locations.
index=main source="xmlwineventlog:microsoft-windows-sysmon/operational" EventCode=1 | search CommandLine="*Salaries.xls*" | table parent_process_name parent_process_id parent_process_path process_path
This gives us some good information to track. We can see that the parent process was indeed Outlook.exe, the parent process ID is 11120, and the process path of both the Outlook and Excel executables. Process path information is a very valuable piece of information when looking for processes launching from places they shouldn’t be (temp directories, startup folders, etc). We could use this process ID to begin hunting down the entire process tree (using the resulting process ID as the parent process ID in each new search), but as mentioned earlier, taking this approach can be quite time-consuming, error-prone, and send you down a path that results in lots of gray hair.
Can we make things easier and create a table of all the things that happened after that spreadsheet was opened? Yes, we can!
A cool app has been built by Donald Murchison called PSTree, which will make your life a lot easier here. After installing that app, along with the Splunk Python SDK, you can pass in parent and child fields and then create a table (did I tell you how much I love tables?) of the resulting process family structure.
Most of the following search is taken from Donald’s examples shown under details in the Splunkbase link above. For the young players out there, this may look confusing at first, but it isn’t. I’ll explain things a bit here.
What he’s doing with the rex commands is just creating some new fields (ParentName and ProcessName) from data within the existing ParentImage and Image fields. If creating regex is getting you down, go here to get some help with your Rex Kwon Do training regime. Heck, we’ve even written about it in our hunting series, “Rex Groks Gibberish.”
With these new fields in hand, he uses the eval command to combine both the extracted ParentName and ProcessName fields with pre-existing ParentProcessId and ProcessId field information. Another field called detail is also created, which combines the _time field information with the CommandLine field information. These steps create quite a nice-looking, informative table at the end.
All three new fields, parent, child, and detail can now be used in the pstree custom command added by the PSTree app. The spaces=50 definition just helps us format the resulting table so that the first column doesn’t contain lots of wasted space.
As we’re trying to trace processes based on our original salaries.xls spreadsheet, we must pass this in as a search parameter after the pstree command has completed its operations. Super important.
Last but not least, we create a table (I Love Table) from the tree data.
index=main source="xmlwineventlog:microsoft-windows-sysmon/operational" EventCode=1 user=wallylambic | rex field=ParentImage "\x5c(?<ParentName>[^\x5c]+)$" | rex field=Image "\x5c(?<ProcessName>[^\x5c]+)$" | eval parent = ParentName." (".ParentProcessId.")" | eval child = ProcessName." (".ProcessId.")" | eval detail=strftime(_time,"%Y-%m-%d %H:%M:%S")." ".CommandLine | pstree child=child parent=parent detail=detail spaces=50 | search tree=*Salaries.xls* | table tree
This may be tough to read depending on your screen size, but the resulting table goes as far as 7 layers deep for our process trace. It starts with Outlook.exe (11120), which then spawns Excel.exe (6416), which then spawns mshta.exe (12404), then onto powershell.exe (16796), which opens another powershell.exe (5912), which then opens cmd.exe (1832), which then opens up further processes (powershell, cacls, bitsadmin, etc). CommandLine data is also shown, but I’m trying to keep my word count as close to 1200, so you’ll need to squint.
Here is a small section zoomed in at the start of the trace.
You can see how tough this would be to do manually, not to mention how many browser tabs you’d inevitably be opening to create each new search. Having this data in a single table with timestamps and command-line data included is very powerful! Hats off to Donald for creating this great app!
Hopefully, this blog has helped you on your process tracking journey.
In the meantime, happy hunting!