Forums: SplunkAdministration: Preprocessing a log file

Previous Topic: splunkd ssl version  |   Next Topic: Wildcards for Data Inputs


Posts 1–6 of 6  |  Post to this topic

Can you preprocess a log file with Splunk and how?

We have an application that spits out log files that are difficult to parse with regexes, XML-parsers or like. You need a (Python) script that will read the data and spit out a more friendly log file format.

Now, we can run a cron job that converts the log files before exposing them to Splunk, but it would be much more convenient to have Splunk pick up the unfriendly files and process them internally before trying to parse and index them the normal way.

Is this possible?

I've tried to play with the "preprocessing_script" parameter in a custom props.conf but it doesn't seem to be picked up and the slunkd.conf (running in debug mode) doesn't hint of anything wrong. The config section is read, just not the preprocessing_script part it seems.

Running Splunk 3.3.2, community edition

Anton

It sounds like you are looking for scripted inputs:

http://www.splunk.com/doc/latest/admin/Scripted%20Inputs

Thanks for the suggestion, araitz.

Scriptet inputs indeed look powerful.

With a scripted input it looks like the script is responsible picking up the event source and hand it over to Splunk. So it looks like I would need to do the implement the find log files, test for changes dance myself when using a scripted input?

However, I was hoping to have the cake and eat it too:

You know, both benefit from all the Splunk input logic to locate log files and detect when they changes and at the same time sneak a conversion script in.

Maybe I could disguise the script as a unarchive_cmd?

I have tried the unarchive_cmd command and it does absolutly nothing.

How can I debug what the problem is?

one test was:

dd count=8

this will read 8 blocks (=4k) from stdin and write it to stdout.

Can anyone help?

Which unarchive_cmd are you referring to?

From props.conf:

invalid_cause = <string>
* Can only be set for a [<sourcetype>] stanza.
* Splunk does not index any data with invalid_cause set.
* Set <string> to "archive" to send the file to the archive processor (specified in unarchive_cmd).
* Set to any other string to throw an error in the splunkd.log if running Splunklogger in debug mode.
* Defaults to empty.

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* DOES NOT WORK ON BATCH PROCESSED FILES. Use preprocessing_script.
* Defaults to empty.

Basically I want to run the log file etc. through another process before it reading in. Just like gzip etc...

Shouldn't it work if the command takes input from stdin and produces output to stdout?

Post to this topic

You must be logged in to post a reply.










close

Flash required to play this video.

Click here to download the free Flash Player.

Description:

Permalink: