
I’ve been working on some apps for 4.0 and finally I can talk details. Over the next couple posts I’ll walk though creating a simple app using the new UI tools and a little XML. This is all based off the Apache logs on my server, so first a little background on how I’ve configured my 4.0 instance.
I have a typical small server whose primary purpose is to host a dozen or so low traffic websites. One site gets half my hits, three more most of the rest and the stragglers round out the lot attracting bots. Each virtual host has separate access_log and error_log files but all use the same format: access_common.
To take advantage of the new multi-index search in Splunk 4, I’ve set up my instance to use different indexes for various sources. In my case, it’s by person, as I have several groups of sites managed by a particular admin. The indexes are named www_something so as the overall administrator I can search across all of them with “index=www_*” and still not have to touch the other system events I’ve got going into the main index. I have also set up roles so each admin sees only the relevant data (and isn’t confused by the rest.) All the config is explained in the docs, so I won’t go over it right now.
There are several reasons to do this. With each broad class of data in a separate index, I can apply different retention policies to each. This can be a big deal for high-traffic webservers where you might want to keep the OS logs around longer than the web logs.
Next, if you can divide your data into discrete categories it makes it easier to assign roles to access only certain parts of it. “All your stuff is in your index” is a much simpler policy to enforce than “You get this, and that, and this other thing…” and so on. You can do that, and with excruciating granularity, via search filters, but under the hood what it does is tack stuff onto your search. This can lead to some pretty hairy searches as splunkd has to decide which results it’s looked through actually should get returned.
The most important is search performance: data can be pulled off disk only so fast. If there is less of it to slog through at once, the files that are looked at are more likely to be relevant and your search will complete faster.