The venerable old-skool Splunk forums are now closed. Feel free to search for old content here, but new posts are no longer supported.

Instead, please visit the thriving community at answers.splunk.com to ask and answer questions about your Splunk deployment and how to get the most out of it.

Forums: SplunkAdministration: Best practice for pre-filtering a bunch of msgs?

Previous Topic: changes which trigger upgrade process  |   Next Topic: duplicate events


Posts 1–3 of 3

How can I nullQueue a bunch of different messages so they don't ever get indexed?

I know how to use transforms.conf and props.conf to route a specific message to nullQueue, but we're in a situation where we have about 20 different syslog messages (noise messages from disk arrays, poorly-written applications, and the like). These add no value and just takeup space and licensed indexing quota.

Is there a best practice for doing something like this? It doesn't seem like adding a new transforms/props entry for each message is very scalable, and they're different enough that writing a regex to match multiple messages is nontrivial.

My fallback plan is to use something like syslog-ng + SEC as a pre-filter to Splunk, but that adds a bunch of complexity that I'm not really wild about.

Thanks!

There's no other way other than regular expressions to filter messages from Splunk at index time. There are more options at search time, but on the other hand, it's not clear to me that this is any simpler than a regex filter (or series of regex filters).

I have a similar issue. I found that using syslog-ng to filter out messages is quite effective for syslog-based stuff, obviously. For non syslog-stuff the nullQueue transformer is probably the best option. Doing that with SEC seems painful, and probably not any more efficient. (Although, there are still things I can do with SEC that I can't with splunk, so we are still using both for the time being.)

I found that redirecting my syslog-ng output to a "nosplunk.log" file (which Splunk doesn't index) was helpful in the process of verifying that I was weeding out the right messages. You approach also has to depend on whether these messages are completely expendable (in other words, you don't need them to even make it to the log files), or you just don't want to waste index space in splunk...

This is probably obvious, but make sure that you are only applying the transformations on relevant sources or sourcetypes. In other words, unless everything you are trying to filter is coming from a single file, than you can use per-source transformers to keep the pre-processing down to a minimum. (In other words, you aren't trying to filter our mesages for app "X" in the log file of app "Y".) It's certainly easier to do per-sourcetype setups as much as possible, but sometimes setting up per-source settings can be better for the kind of filtering that you are looking to do.

If you haven't already gone down this kind of per-source setup path before, you should know that it can be tricky to get the config right. (Although, the docs have been updated since I first tried this, which helps a ton! See "Pattern collisions" section on this page: http://www.splunk.com/base/Documentation/4.0.10/admin/Propsconf). I've found the following command to be very beneficial in confirming the setup on a per-log file basis: splunk test sourcetype /var/log/mail.log

Well. That was longer than I wanted it to be. Hope you find some of it to be helpful.