
A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It’s documented as an internal search command here:
http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv
We are talking about promoting it to public, so while it says unsupported it does work. Here’s how:
I’ve got events from my webserver for my new domain and I want to see what real hits it’s getting and not my own. They look like this:
66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] "GET /category/admin/ HTTP/1.1" 200 5158 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
And I’ve gotten some traffic already:
$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'
count
-----
11424
It’s a standard format that was automatically recognized as sourcetype access_common, so the extracted field “clientip” is already there. I create a csv file containing the values I want to exclude like this:
clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz
This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I’ll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok.
Now I can do this search:
./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]'
$ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count'
count
-----
121
and only get the ones that aren’t from my network. This search also works from the UI as
source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]