
OVERVIEW
The traditional way of detecting an advanced malware or threat compromise in a Windows environment using a signature-based anti-virus or malware product is difficult. Most anti-malware solutions that are signature based rely on a known list of signatures:
- Endpoint protection products don’t have a perfect list of threats to detect all signatures that exist or are known
- Don’t apply to new types of threats that are executed as new executables at the endpoints because there is no known signature to compare against
This traditional approach is forcing organizations to constantly deal with security breaches that range from incidents that deal with data exfiltration, service interruptions and ransomwares that are all dealing with the inability to protect and detect the activities on the endpoints.
Fundamentally the problems lie with many organizations not being able to utilize the very granular Windows system activities events that could be collected from Windows infrastructure as well as applying analytics to that data, to determine what is normal versus what is abnormal, by reviewing all the processes and sessions created at Windows Endpoint.
The challenges with collecting sysinternal data from all endpoint requires coordinating efforts and proper technology that installs a light agent at Windows Endpoint that could collect granular sysinternal events in real time from many Windows systems. Once the details of the Windows activity, in event log format from the endpoint is collected, it needs to be stored in a data platform that could handle the volume of the messages that could range from tens to hundreds of event per second out from a single machine, and to be able to search, apply analytics against that every single system activities events effectively to find anomalies.
SOLUTION
Using Splunk forwarders that include the ability to collect the Windows infrastructure’s sysmon data provides the critical function to collect sysinternal data from the endpoint in real time. Then Splunk transports the events that are relevant in analyzing anomalies for all process and session creations on the endpoint.
Splunk provide two key functions to solve the challenges of making the best use of sysinternal events for detecting early signs of known advanced malware infections.
- Collections of Windows activities: Using Splunk Windows OS-based forwarder to easily collect all sysinternal data through event log
- Provide simple agent to collect all Windows data (event log, sysinternal, perf mon, files)
- Allows secure and high confident transport means in centralizing data to analytics platform
- Sysmon specific formatting and process ability to immediately apply analysis
- Analytics base to searching and analyze anomalies: Using simple search and statistical summation and calculation to highlight rare values in process creation details.
- Ability to pivot into different endpoint criteria to dynamically derive to results
- Ability to apply machine learning
By applying an analytical approach to the data, without using any additional tools, Splunk allows distinction of abnormality in the activities of the endpoint by eliminating a normal pattern that portraits in statistical calculation. The use of this technique can widely be used in most of the organization with 1) any Windows based server infrastructure 2) or with collecting sysinternals from all Windows clients. Application of this use case covers the majority of security operations. Regardless of whether the organization already has endpoint security solution or not, the wealth of information and the details provide significant value to assess the security of an endpoint. There also could be other uses of the sysinternal where it will add more context to either IT operations and service analysis.
DATA SOURCES
Data sources that are required to detect the potential activities of malware on Windows endpoint is sysinternal collected through Windows event log using sysmon. An organization can gain this detailed information by installing sysmon provided by Microsoft, then installing Splunk forwarder to define what needs to be collected and filtered. This sysinternal data is where finding the indications of odd activities would begin, but additional correlation to trace the how and what got infected; further ingesting proxy, IDS/IPS, DNS/stream data is recommended to root case the route of a potential infection and determine the scope and mitigate the incident. Analyzing the sysinternals through Splunk would provide definitive indications of compromise in detecting potential of any malware, whether it’s known or unknown.
- Windows sysinternals using sysmon through event log (required)
- Proxy, IDS/IPS, DNS, stream (recommended for further investigation beyond detection)
Event log with sysmon installed provides the following details to be collected to Splunk:
- Process creation including full command line with paths for both current and parent processes
- Hash of the process image using either MD5, SHA1 or SHA256
- Process GUID that provides static IDs for better correlations as opposed to PIDs that we reused by OS
- Network connection records from the host to another, includes source process, IP address, port number, hostnames and port names for TCP/UDP
- File creation time changes
- Boot process events that may include kernel-mode malware
< Example of Windows event log through sysmon>
COLLECTION OF WINDOWS ACTIVITIES EVENTS
Collecting various pieces of information from the Windows infrastructure is easy when using the Splunk forwarder.
Here are a few simple steps to collect and integrate Sysmon data into Splunk:
- Install Sysmon on your Windows-based endpoint, which can be downloaded from the following link http://technet.microsoft.com/en-us/sysinternals/dn798348
- Install Splunk forwarder on the endpoint and it will forward sysinternal messages real time to Splunk instance
- Install Splunk Add-ons for Microsoft Sysmon and easily configure Splunk to extract and map to CIM. Download it here: https://splunkbase.splunk.com/app/1914/
Once Sysmon is installed, you can use Splunk’s “data Inputs” to decide what you want. Just select the type of event logs to transport to the Splunk Indexer.
Now that you have events in Splunk, there is a wealth of information available to you. The basic search to call the sysinternal events from Splunk index is :
sourcetype=”XmlWinEventLog:Microsoft-Windows-Sysmon/Operational” |
The following is an example of Splunk collected data. Windows event log format is converted into XML containing all different fields into a single line event.
<Event xmlns=’http://schemas.microsoft.com/win/2004/08/events/event’><System><Provider Name=’Microsoft-Windows-Sysmon’ Guid='{5770385F-C22A-43E0-BF4C-06F5698FFBD9}’/><EventID>1</EventID><Version>5</Version><Level>4</Level><Task>1</Task><Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords><TimeCreated SystemTime=’2016-02-04T01:58:00.125000000Z’/><EventRecordID>73675</EventRecordID><Correlation/><Execution ProcessID=’1664′ ThreadID=’1856’/><Channel>Microsoft-Windows-Sysmon/Operational</Channel><Computer>FSAMUELS</Computer><Security UserID=’S-1-5-18’/></System><EventData><Data Name=’UtcTime’>2016-02-04 01:58:00.125</Data><Data Name=’ProcessGuid’>{6B166207-B028-56B2-0000-001082512900}</Data><Data Name=’ProcessId’>4544</Data><Data Name=’Image’>C:\Program Files\apps\Update\Update.exe</Data><Data Name=’CommandLine’>””C:\Program Files\apps\Update\Update.exe”” /ua /installsource scheduler</Data><Data Name=’CurrentDirectory’>C:\Windows\system32\</Data><Data Name=’User’>NT AUTHORITY\SYSTEM</Data><Data Name=’LogonGuid’>{6B166207-A731-56B2-0000-0020E7030000}</Data><Data Name=’LogonId’>0x3e7</Data><Data Name=’TerminalSessionId’>0</Data><Data Name=’IntegrityLevel’>System</Data><Data Name=’Hashes’>SHA1=9D04597F8CFC8841DFA876487DE965C0F05708CC</Data><Data Name=’ParentProcessGuid’>{6B166207-B028-56B2-0000-0010AC4F2900}</Data><Data Name=’ParentProcessId’>2576</Data><Data Name=’ParentImage’>C:\Windows\System32\taskeng.exe</Data><Data Name=’ParentCommandLine’>taskeng.exe {A26A5EC9-73E5-4AE9-A492-04500B20692F} S-1-5-18:NT AUTHORITY\System:Service:</Data></EventData></Event> |
Data collected in XML format with sysinternal events are all parsed into fields into Splunk with the help of “Splunk Add-on for Sysmon.” Browsing through complex sysinternal events is now easy, just point and click on parsed fields.
SEARCHING FOR PROCESS CREATION ANOMALIES
The challenge is how do we protect against the unknown? Unknown here means that there is no list to verify against and things that are not just defined either right or wrong, but what’s right or wrong derives from the data itself. It is based on calculated results with the understanding of what is the majority versus the minority and associated other analytical details related to them.
Objective of the Analytics Approach
The process of detecting the changes of activities under stealth entails attempting to find anomalies by comparing with what happened and existed to what is happening now.
The elements to validate different aspects of determining anomalies are:
- What is pre-existing and new?
- What is the statistics on pre-existed versus new to validate which is old (being normal) and new (as something that needs to be validated)?
- What is the time relations of existed and new entities?
- The amount of association an existing entity has with other entities, such as the number of assets associated with it
After having insights into the questions related to validating anomalies, now we can eliminate the “normal” to filter out the anomalies that are most likely to be evaluated and analyzed.
These kinds of distinction are possible when the statistics of things are compared in relation to each other.
Windows sysinternal provides extensive detail into understanding the status of endpoints in terms of endpoint security and vulnerability. One of the notable powers of analyzing sysinternals is the ability to gain visibility into what processes and files are installed and executed. There are events related to the execution of processes, indicating activities on the system which provides critical sources of information to help security analysts understand:
- What process have been executed
- What is the directory origin of the executable
- What is the parent process that executed the executable
- What is the fingerprint of the executed process
All of these insights gained from the sysinternals are a critical part of collected system activity information in applying analytics to find anomalies of processes and action executed in an endpoint. With the data collected from the different sysmon sources, this is an easy task to do. Using sysmon’s hash information attached to each processes creates events as MD5, SHA1 or SHA256, and an analyst can identify a few different versions of a certain system executable.
For example, why do we care about the full path of a process “cmd.exe?” Even though the “cmd.exe” is a legitimate looking executable on Windows, we can see the odd path of the binaries, potentially linking it to a “black sheep.” How about the MD5 hash of the binary “cmd.exe” that is different from all the other “cmd.exe” in the network? This is a clear indication of file manipulation, potentially malicious code hiding as legitimate executable.
MALWARE PROCESS HIDING AS EXISTING OS OR APPLICATION PROCESS
Most of PC users have experience looking at Windows process monitor, finding no particular problems where the OS seems to be running all the normal processes. Regardless of who it may appear to be the user, we know that the PC is infected with all kinds of malware witnessing for example the browser hijack to an odd site. When malware processes run as if it is apparent to be a normal process would be an example of a “black sheep” malware disguising itself as a normal OS process. How could this kind of “black sheep” be detected?
What about in the case of advanced malware, for example, a type malware that has never been known or detected by an anti-malware software product? This types of malware would be executed on an endpoint limiting the ability of most anti-malware detection software to raise a red-flag because the signature of the new executable is not known. Could this kind of problem be tackled using analytics? Analytics that compare a set of criteria from different executable fingerprints detected that derived from results of analytics.
In order to find this, hashes on the sysmon event play a key role. The hash information that gets attached to the sysmon process creation event represents a unique fingerprint of an executable. Using analytics, if we were to find out what were those existing fingerprints of trusted executable vs comparing the new fingerprint for a same executable that started recently, we can find the processes that are anomaly. This detailed sysmon events about created processes and their associated hash can be analyzed with simple Splunk SPL summation by executable name.
This lists unique counts of executables regardless of how the executables are disguising themselves. A fingerprint of a hash means a non-arguable unique file or executable executed. On top of that, sum the count of those unique hashes does indicate what needs to be looked at closer.
Search Syntax Below:
sourcetype=”XmlWinEventLog:Microsoft-Windows-Sysmon/Operational” Image=*svchost.exe| dedup Computer
| eval TIME=strftime(_time,”%Y-%m-%d %H:%M”) | stats earliest(TIME) count by Image, Hashes |
The search to find all the same executable names with different hashes.
Based on the result of the search, the same executables svchost.exe with the exact same paths were found, but notice the hashes are different. This means that there are two variants of Windows OS, because this infrastructure is running a good balance of hosts that are Windows 7 and Windows 8. This seems normal because given the size of the network with more than 200 hosts, the distribution of hashes for a critical system process “svchosts.exe” is distributed at the quantity of each Windows version. Notice the sum of the instances, knowing the basic facts about the infrastructure running two versions of OS, and seeing a good count of both results, we can conclude that the things look normal.
In the following example imagine the same search returns as the previous example. The result shows the similar number of distributions for the first two majority hash executables, but it shows the third one with fewer hosts with a new SHA1 hash found. This means the same executable with a different hash and significantly lower number of process creation means this is a new executable executed with same name as a system binary. The sum of counts of “1” indicates it’s a rare frequency, not likely to see as a system executable unless we have another new version of OS with different system executables running on the network. If it’s not the case, then this is a suspicious hash that needs to be referenced against a Google search.
Also, looking at the “first(TIME)” indicates the first time the anomalous executable was created and it indicates that it is definitely a new process compared to the normal svchost.exe executables created before. The first time function provides insight into what existed versus was it new and correlating the sum of counts determines what is abnormal. The third hash and newer timestamp executable with a minor number of occurrences are most likely malware that potentially an anti-virus program didn’t detect.
Make sure to verify what hosts are associated with the hashes for two different normal svchost.exe, as well as which hosts are involved in potential malware activities. This can be accomplished by listing unique value in the “computer” field from sysmon data, using the “values(Computer)” function.
sourcetype=”XmlWinEventLog:Microsoft-Windows-Sysmon/Operational” Image=*svchost.exe| dedup Computer| eval TIME=strftime(_time,”%Y-%m-%d %H:%M”)
| stats earliest(TIME), count, values(Computer) by Image, Hashes |
After the analysis of finding a process with new hashes, we can conclude a couple of conditions to define a potential malware sneaking in as a system process:
- The process may look normal from the path and name of the executable, but the hash of the new executable in comparison with existing historical hashes are different
- The frequency of process creation in contrast with existing an executable hash is significantly different.
Understanding the nature of the manipulation and tactics, we can define a query that filters automatically by applying a couple of calculated steps that would consider the quantitative contrast of process creation count with existing and newly executable hashes. Following step using “eventstats” to calculate the sum of all occurrence and applying it to calculate the percentage of occurrence of each executables make it easy to define a relative thresh-hold that would pick out “Odd executables”, even those executables are masking themselves as the sheep.
Search Syntax Below :
sourcetype=”XmlWinEventLog:Microsoft-Windows-Sysmon/Operational” Image=*svchost.exe| dedup Computer| eval TIME=strftime(_time,”%Y-%m-%d %H:%M”)
| stats earliest(TIME) count by Image, Hashes | eventstats sum(count) as total_host | eval majority_percent=round((count/total_host)*100,2) |
Now, how do we define a search (rule) to have Splunk look for these kinds of odd executables?
Expanding the previous relative quantity calculation, applying a filter to look for “majority_percent<5” will eliminate the normal groups and expose the anomalous executable group based on relative threshold.
sourcetype=”XmlWinEventLog:Microsoft-Windows-Sysmon/Operational” Image=*svchost.exe| dedup Computer
| eval TIME=strftime(_time,”%Y-%m-%d %H:%M”) | stats earliest(TIME) count by Image, Hashes | eventstats sum(count) as total_host | eval majority_percent=round((count/total_host)*100,2) | where majority_percent<5 |
This kind of recipe can be applied to Splunk Enterprise’s saved search or Enterprise Security’s correlations search feature to do the analysis job for us and automatically send the analyst alerts on finding the anomalous processes that could start up in any one of the Windows workstation running on the network.
SUMMARY
By using Splunk Enterprise and Microsoft Sysmon, the security analyst can gain significant power over understanding detailed activities on endpoint as well as the ability to detect advanced and unknown malware activities. Statistical analysis over detailed endpoint data contrasts risk in quantitative values for analysts to easily profile the behavior of compromised hosts by adversaries and further define rules based on those values as the threshold. This empowers security analysts to apply similar techniques to solve many similar problems and use cases that could be addressed by only an analytical approach. Analytical approaches that contextually distinguish the differences and anomalies provides the security operations to detect advanced threats faster to ultimately minimize business impact.
----------------------------------------------------
Thanks!
Young Cho