Splunk recognizes when a file that it is tailing (such as /var/log/messages) has been rolled (/var/log/messages1) and will not read the rolled file in a second time.
Note: Splunk does not recognize tar or gzip files produced by logrotate. You can explicitly set blacklist rules for .tar or .gz to prevent Splunk from reading these files as new logfiles, or you can configure logrotate to move these files into a directory you have not told Splunk to read.
How log rotation worksThe tailing processor picks up new files and reads the first and last 256 bytes of the file. This data is hashed into a begin and end cyclic redundancy check (CRC). Splunk checks new CRCs against a database that contains all the CRCs of files Splunk has seen before. The location Splunk last read in the file is also stored.
There are three possible outcomes of a CRC check:
1. There is no begin and end CRC matching this file in the database. This is a new file and will be picked up and consumed from the start. Splunk updates the database with new CRCs and seekptrs as the file is being consumed.
2. The begin CRC is present and the end CRC are present but the size of the file is larger than the seekPtr Splunk stored. This means that, while Splunk has seen the file before, there has been information added to it since it was last read. Splunk opens the file and seeks to the previous end of the file and starts reading from there (so Splunk will only grab the new data and not anything it has read before).
3. The begin CRC is present but the end CRC does not match. This means the file has been changed since Splunk last read it and some of the portions it has read in already are different. In this case there is evidence that the previous data Splunk read from has been changed. In this case Splunk has no choice but to read the whole file again.
Comments
It strikes me that it would be nice if the Getting Started page gave you an option when tailing /var/log to add
_blacklist = .*\.bz2
to inputs.conf so you could skip ancient data and avoid compressed file issues. Or, am I wrong about there being a problem with .bz2 files too?
Posted by gpullis on Mar 18 2008, 1:04pm
How (if at all) does Splunk deal with logfiles which are compressed upon rotation?
Posted by bsr on Sep 05 2007, 11:36am