Forums: SplunkAdministration: Fixed fields extraction

Previous Topic: Is it possible to **NOT** have a passwd on login for a generic account?  |   Next Topic: Cannot only add one Windows event log


Posts 1–10 of 16  |  Post to this topic

I'm having trouble with extracting a log file with fixed fields...

The log file has following format (first 3 fields, rest omitted)

699574 Backup Active ...
699575 Backup Active ...

My transforms.conf looks like this:

[bkptest]
REGEX = ^(.{9})
FORMAT = $0 bkp_jobid::$1
DEST_KEY = _meta

props.conf is ok.

I would assume the regexp to fetch the first nine characters (first three whitespaces, then 6 numbers). Nothing will be found.

The same with REGEX = ^\s{3}(\d+) works fine.

What's up?

Is there a recommendation for working with fixed column logfiles? Events are very long with different fields missing for some events. I would rather just count characters instead of crafting complex regexes....

Hi!

Your regexes seem to work for me (assuming, as you said, that there are three whitespace characters leading up to the six digit number).

I think that there might be something else wrong with your extraction. Please feel free to email support@splunk.com with a sample of your log and a tar.gz of your bundles directory, and I will be happy to assist.

-Alex

I have opened a case (no answer yet).

I've played a little bit around with the regex and the logfile.

If I remove the 3 leading space characters and change my regex to ^.{6} everything works like a charm.

Any Ideas?

I checked the case and we didn't see any attachments, such as your bundle and sample logs. If we can get those, we can try to reproduce.

New case opened CASE [10275]

Hi Burana,

I was able to take a look at your sample log, thanks for sending it in!

As a best practice I would highly recommend using something like this to grab the first field in the log:

REGEX = ^\s+(\d+)

I usually avoid strict quantifiers if at all possible, preferring greedy or lazy to strict (much like the rest of my dealings!).

That said, I found the problem with your quantifier. It turns out there are four whitespace characters preceding the six digit sequence, so this matches:

REGEX = ^\s{4}(\d{6})

Please let me know if you continue to have trouble.

Happy Splunking!

Hi Alex

The sample I've sent you had 4 spaces, yes, but the problem is not solved...but I'm getting there.

I've rewritten the REGEX to do a greedy matching. Now I have another problem.

Lets say I have a field where three values are possible (one has a space in between):

Backup
Calalog Backup
Restore

Following regex matches correctly:

REGEX = (Backup|Catalog Backup|Restore)

But in SplunkWeb only "Catalog "will be shown as the field value. What comes after "Catalog" is omitted (whitespace+Backup)!

That seems to be the same problem as I have with my previous regex. There I have several whitespaces before the integer. The pattern will be matched, but everything after the whitespaces will be omitted (in this case nothing will be stored in a field).

My guess is, that grouping ( with $1 $2 etc.) does not work correctly and omits everything after the first whitespace character.

I would always recommend using the whitespace (\s) operator instead of just pressing spacebar because, as you were getting at, regex will break on the whitespace. This regex matches on the file sample that you sent me:

REGEX = ^\s+\d+\s+(Backup|Catalog\sBackup|Duplication)

Looking at the structure of your logfiles, the entries are not normalized by type. What I mean by this is that the format is different for Backup, Duplication, Catalog Backup, etc. In cases like this where there is no normalization across type, it is often significantly easier to write separate transforms for each type.

So for the sample log you sent me, I make the following changes (leaving the other stuff like MAX_TIMESTAMP_LOOKAHEAD in):

--> props.conf

[source::....(bkp)]
sourcetype = nb
TRANSFORMS-nb = backup,duplication,catbackup

--> transforms.conf

[backup]
REGEX = ^\s+(\d+)\s+(Backup)\s+(\w+)\s+(\d)\s+(\w+)\s+(\w+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+\s[^\s]+)\s([^\s]+)\s+([^\s]+\s[^\s]+)\s(\w+)\s+(\d)\s+(\d+)\s+(\d+)\s+[^\s]+\s+(\d+)
FORMAT = $0 bkp_jobid::$1 bkp_type::$2 bkp_state::$3 bkp_status::$4 bkp_policy::$5 bkp_schedule::$6 bkp_client::$7 bkp_server::$8 bkp_start::$9 bkp_elapsed::$10 bkp_end::$11 bkp_dst::$12 bkp_unit::$13 kb::$14 completion::$15 kbps::$16
DEST_KEY = _meta

[duplication]
REGEX = ^\s+(\d+)\s+(Duplication)\s+(\w+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+\s[^\s]+)\s([^\s]+)\s+(\w+)\s+(\d)\s+(\w+)\s+(\d)
FORMAT = $0 bkp_jobid::$1 bkp_type::$2 bkp_state::$3 bkp_client::$4 bkp_server::$5 bkp_start::$6 bkp_elapsed::$7 bkp_dst::$8 bkp_unit::$9 operation::$10 completion::$11
DEST_KEY = _meta

[catbackup]
REGEX = ^\s+(\d+)\s+(Catalog\sBackup)\s+(\w+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+\s[^\s]+)\s([^\s]+)\s+([^\s]+\s[^\s]+)\s+(\d)\s+(\d+)
FORMAT = $0 bkp_jobid::$1 bkp_type::$2 bkp_state::$3 bkp_status::$4 bkp_server::$5 bkp_start::$6 bkp_elapsed::$7 bkp_end::$8 bkp_unit::$9 completion::$10
DEST_KEY = _meta

Thanks Alex for your efforts.

I was able with some effort to write a single effort.

I'm still convinced, that Splunk does not index fields completly when they contain spaces. When I fixed the "Catalog\sBackup" to contain the \s character, only "Catalog" will be indexed.

I don't know if this is by design or if this is a bug...Can you check on that?

BTW: Just installed 3.1. Nice progress...

I will check on that, it could be by design or it could be a bug.




1   |   2    Next »    

Post to this topic

You must be logged in to post a reply.










close

Flash required to play this video.

Click here to download the free Flash Player.

Description:

Permalink: