I’ve been asked a few times on how best to search for events which may contain many different discrete values for a field. It’s essentially using an OR (disjunctive search) in the search language. For example, you can do this:
sourcetype=my_sourcetype (planet=mars OR planet=earth OR planet=saturn)
This works fine for a finite case where you only have a handful of planets, but what happens if the field’s possible search criteria changes daily and may contain hundreds of possible values that you would like to input for the search? Certainly, using OR terms with over a hundred entries sounds impractical. A solution is to have an external file that contains all the possible values that you would like to use in the disjunctive search be used within the search language as input to the search criteria. With Splunk 4.0, one way this is possible out of the box is with the new
lookup command. For an introduction to this command, please consult Bob Fox’s blog entry discussing example usage. For now, I will assume you have basic knowledge about its usage and I will list a possible solution for trying to use
OR with many possible values for a field.
First, use field extraction to extract the field in question. For our example I’ll use an ip address field. Next, create a CSV file in your
SPLUNK_HOME/etc/app/<app_name>/lookups/ directory. I created iptable.csv with the following sample content to be used for input.
Since I’m not interested in creating a real mapping from one field (ip) to another (myip), I used the same value in both columns to conform to the syntactical usage of the
lookup command. Now, in your
SPLUNK_HOME/etc/apps/<app_name>/local directory you’ll need to create or modify two files. First, edit
filename = iptable.csv
Second, edit props.conf and use your sourcetype to start the stanza. I am using mail as my sourcetype.
lookup_ip = search_ip ip OUTPUT myip
Now, from your browser, log into Splunk and reload the props.conf and transforms.conf file for your new additions:
sourcetype=mail | extract reload=true
You are now ready to use your file as input to search for all events that contain ip addresses that were in your CSV file. One possible search is:
sourcetype=mail | lookup search_ip ip OUTPUT myip | search myip=*
The last search command will find all events that contain the given values of myip from the file. In essence, this last step will do your disjunctive search for you without having to type in a long sequence of
OR terms. Finally, if your requirement is that you want to search on the top N (N is an integer) values for a field each day, Splunk can help you create the CSV input file. Simply run the following search assuming you want the top 100 values for IP in our example:
sourcetype=mail | top limit=100 ip | fields + ip
You can then copy and paste the the values into your CSV file. In short, today’s blog entry gave you one possible way to use the content of a file for input for your disjunctive search. There may be more approaches and you are welcome to discuss them in the comments.