The Splunk App for Active Directory and How I tamed the Security Log

It’s time for another question to be fully answered from .conf 2012.  The question was fairly simple – “The Windows Security Log contains a lot of data.  Most of it isn’t relevant to the Splunk App for Active Directory.  How do I prevent indexing of data I don’t need?”

There are actually two pieces to this.  The first is the removal of event codes within the WinEventLog:Security that are not necessary.  The second is the removal of data from within events that are not necessary.  We’ll go through each one in turn.

The Windows Security Log is a dumping ground for a lot of Microsoft systems that need to produce audit or security information.  The Splunk App for Active Directory only uses a fraction of the events.  The others, while useful, chew up your index.  This isn’t merely a cost issue.  It’s also a search time performance and disk space issue.  If you do the following search, you will get a list of all the event codes, sorted by frequency.

index=winevents sourcetype=wineventlog:security|stats count by EventCode,name|sort –count

This will give you a target list of event codes that will be the most impactful if you ignore them.  Each Event Code has a unique description.  You can read all about each description on the web (I use as my resource).  You will see plenty that you will want to keep, but you might see others that you don’t want to keep.  For instance, if you have Windows Firewall with Advanced Security turned on, you will get code 5156 on a domain controller every time something asks the domain controller to authenticate, which can be thousands a second and can burn up about a third of the license cost for the windows security log.  Similarly, event code 5157 indicates that a packet was blocked.  These are good things to monitor if you are monitoring the windows firewall, but most people don’t need them.

The Splunk App for Active Directory does need a bunch of event codes.  To see if you need to keep a particular event code, look for it in Splunk_for_ActiveDirectory/default/eventtypes.conf.  If it’s there, you need it.

First off, a couple of disclaimers:

1)      You will be removing data from your events.  Data destruction is inherently dangerous – you can always delete data later, but re-creating it is impossible.  ALWAYS test on a non-production environment before you push to production.  Splunk is not liable for your configuration errors.

2)      Setting up too many index-time transformations on a high volume sourcetype (such as the WinEventLog:Security) may create an indexing bottleneck and latency in event acquisition.

Event Codes that you don’t need can easily be removed by using entries in the props.conf and transforms.conf files.

In the props.conf file, we need to enable a transform to handle the routing:

TRANSFORMS-firewall = dumpFirewallEvents

Then in the transforms.conf file, we need to write a transform to route the events to the bit bucket, which is called the nullQueue within Splunk.

REGEX = (?ms)EventCode=(5156|5157)
DEST_KEY = queue
FORMAT = nullQueue

The next question is where to put these stanzas that we are adding into props.conf and transforms.conf.  This depends on how you have configured your environment to a large part.  If you have a heavy forwarder on your domain controllers (which isn’t recommended), then you can add the props.conf and transforms.conf to the Splunk_TA_windows in the local directory.  This will prevent the data from being sent over the wire to the indexer.  If, however, you are following Best Practice and installing a universal forwarder on the domain controllers, then you will need to do it at the indexer.  My suggestion is to use the same Splunk_TA_windows on all systems (domain controllers, indexers and search heads) and place the new stanzas in the local directory of this app.  That way, it’s everywhere it needs to be.

There was a second part to this question – how do I remove text from within an event that I don’t need. If you have a Windows Server 2008 or later, take a look at EventCode=4624 with the following search:

index=winevents sourcetype=wineventlog:security EventCode=4624

There is a huge amount of static text at the end of the event that gives you an explanation of what this event is all about.  This is useful for events that are infrequently generated, but 4624 is generated several times a second in a busy site.  Adding 1Kb to the end is not necessary and unnecessarily bloats the index volume.  There are two ways of fixing this.  The first is by using a props.conf entry and the SEDCMD directive, and the second is to use a transform.  First, the SEDCMD.  This is by far the simplest – add an entry to your props.conf as follows:

SEDCMD-shortern4624 = SEDCMD-shortern4624 = s/(?mis)(.*EventCode=4624.*)This event is generated when a logon session.*$/\1/g

In this case, the regular expression is a Perl regular expression, and has slightly different rules from others.  Most notably, the backslash is used as a back-reference, so \1 means “the first match group”, or in this case “everything up to the string ‘This event is generated…’.  If you don’t like to mix and match your regular expression idiosyncrasies, then maybe a transform is more your style.  This can be added on instead of the SEDCMD to your existing props and transforms for removing events.

TRANSFORMS-firewall = dumpFirewallEvents
TRANSFORMS-shorten = shorten4624

The transform is a little different as we are editing rather than re-routing.

REGEX = (?ms)(.*EventCode=4624.*)This event is generated when a logon session
DEST_KEY = _raw

Reading the regular expression, we create a capture group of the stuff we want to keep and then write that back to the _raw key before indexing it.  This removes anything we don’t want to keep since it isn’t captured in the regular expression grouping.  You can place these elements in the same place as the previous settings – for example, the local directory of Splunk_TA_windows.

Finally, let me repeat those disclaimers.  Each of these techniques is an index-time activity.  You can delete or ignore the data later on, but you cannot recreate it afterwards.  It requires you to make a conscious decision to ignore data, and that’s a good thing.  Ignore too much and you lose valuable information that may be needed later on.   Needless to say, always test such a destructive move before deploying on your production Splunk instance.  Also, remember that these techniques have performance implications, especially on a high volume sourcetype like the Windows Security event log.  Take that into account as you decide what to cut out.

Posted by