inputcsv to restrict a search by a list of field values

A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It’s documented as an internal search command here:

We are talking about promoting it to public, so while it says unsupported it does work. Here’s how:

I’ve got events from my webserver for my new domain and I want to see what real hits it’s getting and not my own. They look like this: - - [23/Oct/2008:01:42:21 -0700] "GET /category/admin/ HTTP/1.1" 200 5158 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"

And I’ve gotten some traffic already:

$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'

It’s a standard format that was automatically recognized as sourcetype access_common, so the extracted field “clientip” is already there. I create a csv file containing the values I want to exclude like this:


This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I’ll just put it there. Note that I could also have used* if I wanted to, wildcards are ok.

Now I can do this search:

./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]'

$ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count'

and only get the ones that aren’t from my network. This search also works from the UI as

source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]

Posted by