The venerable old-skool Splunk forums are now closed. Feel free to search for old content here, but new posts are no longer supported.

Instead, please visit the thriving community at answers.splunk.com to ask and answer questions about your Splunk deployment and how to get the most out of it.

Forums: SplunkAdministration: Filter log content before forwarding

Previous Topic: duplicate events  |   Next Topic: props vs CSV file


Posts 1–6 of 6

I've been searching high and low for someone else that has attempted this or confirmed a solution but am running stuck.

I'm forwarding apache access logs from a linux server with a splunk lightweight forwarder. The logs are rolled every day with cronolog. They forward to a single splunk index/search server.

The lightweight forwarder is setup to monitor the logs directory\access_log_* (our access log name takes the form access_log_[date].txt. (This part works great, I'm providing it for context.)

Our webservers are behind a layer 5 switch which uses a simple keepalive.html file to determine if the webserver is available or failover and round robin for load balance. This means that our access_logs are full of gets to the keepalive.html. We need that in the logs to verify content switch functionality when doing routine maintenance but the entries clog up splunk.

I know I can setup a search with | delete and purge those entries from the splunk index or just add a NOT to the search to eliminate those entries, but they take up space, consume forwarding bandwidth, and throw off our total event counts so I'd rather not even have them forwarded.

Is this possible? Is there a way to filter the contents of a log file before forwarding to the splunk indexing/searching server?

A light forwarder can not filter events (see http://www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F ) but you can filter on the indexer before it gets indexed (and you will not consume license for doing so), see http://www.splunk.com/base/Documentation/latest/Admin/Routeeventstospecificqueues to set up a regex-based transform that routes the selected events to the "nullQueue"

Thanks, I will look into filtering before indexing topic. Is there another way to do this on the webserver so I don't consume the bandwidth short of configuring apache to not log from the content switch IP?

I think I answered my own question with the link gkanapathy sent:

"Important: When you choose to filter your data depends on your distributed setup. However, the filtering needs to occur on the Splunk instance that parses the data; this may be either the indexer or the forwarder instance. With the 'SplunkLightForwarder' app enabled, these settings go on the indexer side. With the regular 'SplunkForwarder' app, they go on the forwarder side."

So if I disable the lightweight forwarder I can parse info at the generation source (in my case apache) and not send the data back to the indexer.

It worked. Just for future reference here is the transforms.conf I used to capture the specific log from the content switch: (10.10.10.10 is the IP of the content switch and keepalive.html is the file the content switch looks for to determine availablity)

[setnull]
REGEX = ^10.10.10.10.*keepalive.html.*$
DEST_KEY = queue
FORMAT = nullQueue

Just a quick recommendation on your regex. You might want to introduce a space between the end of your IP and your .*. I'm guessing you don't want to match 10.10.10.105, for example. Also, using the start of line (^) anchor is a right thing to do here, but you don't need really the .*$ at the end.

I'd suggest something like this:
REGEX = ^10\.10\.10\.10 .*/keepalive\.html

If you want to be really clever, you can also filter out only 200 (OK) status messages -- that way if the file is ever missing (403, I think) or somehow returns any other non-200 status, you can have a way of seeing it within splunk. (I don't know if this is possible based on your setup or not, but I'd found things like this to be helpful in the past, and splunk gives you a lot of options). For example, you use could something like so:
REGEX = ^10\.10\.10\.10 .* "GET /keepalive\.html [^"]+" 200

Hope that helps.