It’s That Time Again!

The other day I was asked how Splunk can be configured to index a file where the events have different timestamps.  If you index this type of log file, your events end up being merged together because the timestamps are in multiple formats and may end up looking something like this:


Here is an example snippet of a catalina.out log file with multiple timestamps.  Feel free to import this into your own Splunk instance for learning purposes.

Mar 15, 2014 8:18:33 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 576 ms
2014-03-17 23:58:21,246 [pool-2-thread-3068] ERROR org.apache.thrift.server.TThreadPoolServer - Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(

You certainly can’t ask the developers to change their logging methods just so that it works with Splunk, but what you can do is teach Splunk to recognize the multiple timestamp formats using the DATETIME_CONFIG option in the props.conf configuration file.  In this case we have a log file called catalina.out that is being monitored by Splunk and we set the sourcetype to catalina_manytimestamps.  Normally Splunk defaults to using $SPLUNK_HOME/etc/datetime.xml for timestamp recognition, but you have the ability to override the default timestamp recognition behavior.  In this example I configured DATETIME_CONFIG in the props.conf file to use catalina.xml.


DATETIME_CONFIG = /etc/apps/catalina/local/catalina.xml
LINE_BREAKER = ([\r\n]+)(?:(?:\w+\s*\d{1,2},\s*\d{4}\s*\d{1,2}:\d{2}:\d{2}\s*\w+)|(?:\d{4}-\d{2}-\d{2}\s\d{1,2}:\d{2}:\d{2},\d{3}))

I also configured the LINE_BREAKER option to properly break the events on the different timestamps.  Don’t let regular expressions scare you off!  Test out this LINE_BREAKER regular expression on the sample log at my favorite online REGEX tester,

Below is an example of a custom DATETIME_CONFIG file.  Each of the _maydatetimeformats is configured to match the custom timestamps in the events.  You can define as many formats as you like and update the example to suit your needs.

<define name="_mydatetimeformat2" extract="year, month, day, hour, minute, second, subsecond">
<define name="_mydatetimeformat1" extract="litmonth, day, year, hour, minute, second, ampm">
     <use name="_mydatetimeformat1"/>
     <use name="_mydatetimeformat2"/>
     <use name="_mydatetimeformat1"/>
     <use name="_mydatetimeformat2"/>

Splunk can now index the data, recognize the timestamps and display the event line breaks correctly.  Take a look:

Splunk timestamp event line breaks

I hope that this example can help you overcome any of your future timestamp recognition challenges.

David Maislin

Posted by