Needless to say, we delivered a feature packed release in Splunk 6 a few weeks ago. With all the buzz around Data Model and Pivot, you might have missed a few of the other cool things we’ve been working on back in the bit factory.
Historically, if you were going to Splunk anything with a file header, like a CSV or IIS log, we attempted to take the file header, read in the field names, and create a props and transforms for you in the learned app using DELIMS. While this worked ok for local file ingestion on a Splunk server for CSV, CHECK_FOR_HEADER would get confused with multi-line headers like those found in IIS. For example:
#Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2013-10-18 18:35:33 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
Moreover, if you were monitoring a file with a header using a Universal Forwarder, the props/transforms/learned magic happened locally on the Forwarder and did not get transmitted to your indexing or search tier making this quite a manual process.
Good news! In Splunk 6 we’ve added several props.conf stanzas to better handle the diversity of header formats out there and make this mapping of field values found in headers easier. So for our IIS example, I would put the following in inputs.conf out on my Universal Forwarder:
Now IIS is what we call a pre-trained sourcetype so if you go look in $SPLUNK_HOME/etc/system/default/props.conf you will see something that looks like this
[iis] ... INDEXED_EXTRACTIONS = w3c
In previous versions of Splunk, this is where you would see the (not working so well) CHECK_FOR_HEADER = True. In Splunk 6, we’ve replaced this with the stanza INDEXED_EXTRACTIONS = w3c. This is the “easy button” for IIS logs as they are, by default in IIS, found in this format detailed above with four lines of comments with the actual field names found in the fourth line. Several other Microsoft products, and others, use this header style. This actually breaks down to a more detailed use of the props.conf controls I mentioned before like this:
FIELD_DELIMITER = whitespace FIELD_HEADER_REGEX = ^#Fields:\s*(.*) MISSING_VALUE_REGEX = - TIME_FORMAT = %Y-%m-%d %H:%M:%S TZ = GMT TIMESTAMP_FIELDS = date,time
If you want to see all of the controls for handling file headers, check out the page in the Splunk documentation here. We’ve given you the “easy button” for CSV, IIS (W3C), Tab Separated Values, and Pipe Separated Values. The rest of the controls are for custom logging formats or for where vendors throw in nuggets like tab spacing in the header definition.
Now on to the real magic. With this INDEXED_EXTRACTIONS = w3c, we are taking the contents of the header and doing Index-time field association with the contents of the file. So coming over to Splunk, if I had
#Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2013-10-18 18:35:33 #Fields: date time s-ip cs-method ... 2013-10-18 18:35:33 ::1 GET ...
It would look like (in so many words):
date = 2013-10-18 time = 18:35:33 s-ip = ::1 cs-method = GET ...
Although we are not messing at all with the file format so it looks like it did the moment it left your IIS host. In Splunk you see:
No field extraction to mess with and as implied, this association occurs at Index time so reporting on the sourcetype is wicked fast. Another happy by-product of this is that the header does not get indexed as an event. Historically you would have to craft your search to exclude the hashed lines in the header or use an eventtype to report on the data sans header. We just take what we need from the header and throw it away.
One note on deployment details. We actually had to make some tweaks to the way in which Universal Forwarders talk to Splunk Indexers so to use this feature, your Universal Forwarders will need to be at Splunk 6 or greater and your Indexers will need to be at Splunk 6 or greater as well.
Now as many IIS Admins will know, when you change the fields that IIS reports on, such as adding or removing a field, IIS will rewrite the header with the new field(s) mid-file. For the first iteration of this feature, we didn’t solve this problem so it behooves you to be careful when adding or removing fields. We will actually ignore the header mid-file and not treat it as an event but now you will have plus or minus one or many fields and we’re still using the original header we saw at the top of the file. In this case, Splunk will capture the actual data but give it a default field name which could get tricky if you were wanting to report on this field. There are a couple of strategies you could use here as we improve the feature like setting IIS to create a new log file nightly or weekly and make your field changes coincide with that as closely as possible so any Splunk reporting doesn’t get horked.
All-in-all a huge improvement in how Splunk deals with headers either locally or in your distributed Splunk environment. Nice work Amrit et al.