TIPS & TRICKS

Monitoring input files with a white list

There are many ways to feed data into Splunk. One method is to monitor the files within a directory. In the default ‘monitor’ configuration, Splunk will try to index all files within a specified directory. In some cases, you may have a directory which contains many files including some that you do not want to index. Splunk can be configured to index specific file types as well as sub directories. Here is a real-world working example of how to use a white list…

Let us assume we want to index certain compressed files (*.gz) where the file name starts with “200906”. One of the filename’s is “20090631.gz”. These files exist in a specific directory: “/storage/datacenter/host1/webserver”. To make things more interesting, I have other *.log files in that directory. There are also other subdirectories within datacenter (such as host2, router1, router2). I want to only index the “host” (host1 and host2) files and exclude any router files. Additionally, there are appserver and system directories which reside under each host directory. Conceptually, you want to do the following:

* Tell Splunk to monitor the /storage/datacenter directory
* Set a whitelist for this input
* Edit the REGEX to match all files that contain “host” in the underlying path
* Edit the REGEX to match all files that contain “webserver” in the underlying path
* Edit the REGEX to match all files that start with “200906”
* Edit the REGEX to machh all files that end with “.gz”

Your final stanza in the $SPLUNK_HOME/etc/system/local/inputs.conf file would resemble the following:

[monitor:///storage/datacenter/]
sourcetype=gzfiles
_whitelist=host[^/]*/webserver/[^/]*200906[^/]*\.gz$

The above stanza would index the following files:

/storage/datacenter/host1/webserver/20090601.gz
/storage/datacenter/host1/webserver/20090602.gz
/storage/datacenter/host2/webserver/20090601.gz
/storage/datacenter/host2/webserver/20090602.gz

The above stanza would NOT index the following files or directories:

/storage/datacenter/logfile.txt
/storage/datacenter/router1/logfile.log
/storage/datacenter/host1/appserver/20090601.gz
/storage/datacenter/host2/webserver/20090601.txt

The following doc was referenced and can be viewed for more details: http://www.splunk.com/base/Documentation/latest/Admin/WhitelistAndBlacklistRules

Simeon Yep
Posted by

Simeon Yep

Join the Discussion