Documentation: 3.2.1
Print Version Contents
This page last updated: 04/22/08 02:04pm

Log file rotation

Splunk recognizes when a file that it is tailing (such as /var/log/messages) has been rolled (/var/log/messages1) and will not read the rolled file in a second time.

Note: Splunk does not recognize tar or gzip files produced by logrotate. You can explicitly set blacklist rules for .tar or .gz to prevent Splunk from reading these files as new logfiles, or you can configure logrotate to move these files into a directory you have not told Splunk to read.

How log rotation works

The tailing processor picks up new files and reads the first and last 256 bytes of the file. This data is hashed into a begin and end cyclic redundancy check (CRC). Splunk checks new CRCs against a database that contains all the CRCs of files Splunk has seen before. The location Splunk last read in the file is also stored.

There are three possible outcomes of a CRC check:

1. There is no begin and end CRC matching this file in the database. This is a new file and will be picked up and consumed from the start. Splunk updates the database with new CRCs and seekptrs as the file is being consumed.

2. The begin CRC is present and the end CRC are present but the size of the file is larger than the seekPtr Splunk stored. This means that, while Splunk has seen the file before, there has been information added to it since it was last read. Splunk opens the file and seeks to the previous end of the file and starts reading from there (so Splunk will only grab the new data and not anything it has read before).

3. The begin CRC is present but the end CRC does not match. This means the file has been changed since Splunk last read it and some of the portions it has read in already are different. In this case there is evidence that the previous data Splunk read from has been changed. In this case Splunk has no choice but to read the whole file again.

Comments

  1. It strikes me that it would be nice if the Getting Started page gave you an option when tailing /var/log to add
    _blacklist = .*\.bz2
    to inputs.conf so you could skip ancient data and avoid compressed file issues. Or, am I wrong about there being a problem with .bz2 files too?

  2. How (if at all) does Splunk deal with logfiles which are compressed upon rotation?

Log in to comment.