TIPS & TRICKS

inputcsv to restrict a search by a list of field values

A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It’s documented as an internal search command here:

http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv

We are talking about promoting it to public, so while it says unsupported it does work. Here’s how:

I’ve got events from my webserver for my new domain and I want to see what real hits it’s getting and not my own. They look like this:

66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] "GET /category/admin/ HTTP/1.1" 200 5158 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

And I’ve gotten some traffic already:

$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'
count
-----
11424

It’s a standard format that was automatically recognized as sourcetype access_common, so the extracted field “clientip” is already there. I create a csv file containing the values I want to exclude like this:

clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz

This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I’ll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok.

Now I can do this search:

./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]'

$ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count'
count
-----
121

and only get the ones that aren’t from my network. This search also works from the UI as

source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]

Splunk
Posted by

Splunk

Join the Discussion