The venerable old-skool Splunk forums are now closed. Feel free to search for old content here, but new posts are no longer supported.

Instead, please visit the thriving community at answers.splunk.com to ask and answer questions about your Splunk deployment and how to get the most out of it.

Forums: SplunkAdministration: Dynamic Meta Data Assignment

Previous Topic: lean forwarder, props.conf and priority attribute  |   Next Topic: Where is the HOWTO on setting up Deployment Server?


Posts 1–10 of 10

Hi All

From my post on http://bit.ly/97CEi3

Splunk is infinitely configurable when it comes to consuming data produced by applications or devices. However one of the concepts I've been trying to get my head around configuration wise is the dynamic assignment of meta data. I still don't have a good working model.

Why? Well It's very useful to add context to events, for example if I'm indexing syslog data it could be very helpful to add additional key value pairs admin=bob, location=central. When you hit events at search time I can use these keys without employing any search magic.

sourcetype=syslog failure | dedup admin

The reason you want to add the keys as meta data rather than just appending the keys to the actual raw event is that it messes with the integrity of the event not to mention it's readability.

Prior to 4.x you could get smart with the header *SPLUNK* but it's now been relegated to the sinkhole.
Since I'm in a position to mangle events before they get indexed, I tried to append a line containing the keys and used a transform to extract the keys as meta data.

[sysadmin]
REGEX = sysadmin=(\w+)
FORMAT = sysadmin::$1
WRITE_META = true

This worked, however trying to remove the key(s) after extracting them failed dismally.

[clean]
REGEX =(?m)(.*)sysadmin=\w+$
FORMAT = $1
DEST_KEY = _raw

Ironically both transforms work but not at the same time. They appear to be mutually exclusive and I suspect my cleaning transform should be sending it's output to another queue.

I also tried to use the new SED command to cleanup after transformation, however this didn't work either since it appears transforms are processed after SED commands.

The Splunk header may be useful here since the monitor will remove it once it has evaluated it. So you could specify your meta keys, which Splunk will ignore, collect and index them with a transform. The monitor will then remove the line without any further config. This is obviously not an option but it will do the job.

In my discussions with Splunk-a-nista's I've noticed that they struggle to see why it would be useful to do this during index time and I've been advised to try all kinds of search time voodoo to try and achieve the outcome I'm looking for. I suspect the reason for this is that people are use to working with other peoples data, in this case you sit between the producer and the consumer and have the opportunity to enrich events with additional context.

Am I missing something here?

Can you post your props.conf and a full sample event too?

Wait, so you just want a field to appear as meta data, but you *don't* want it in the raw event text? Is that basically the idea?

If you need transforms to run sequential, just list them sequentially in order:

TRANSFORMS-seq = sysadmin,clean

But if it were me, I'd just leave the extra field in the separate line and, yes, extract it at search time as necessary. There's no performance advantage to having it in an indexed field (vs extracted from raw), though I suppose it does mean that to search for the value you are required to provide the field name, which might be either a plus or a minus.

Hi Guys

"Wait, so you just want a field to appear as meta data, but you *don't* want it in the raw event text? Is that basically the idea?"

That is 100% spot on however to get Splunk to do this has been a challenge.

props.conf ----------------------------------

[meta]
TIME_FORMAT = %d/%m/%Y
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 10
TRANSFORMS-seq=getkey1,getkey2,getkey3,clean

transforms.conf ----------------------------------
[getkey1]
REGEX =(?m)key1="(\w+)"
FORMAT = k1::$1
WRITE_META = true

[getkey2]
REGEX =(?m)key2="(\w+)"
FORMAT = k2::$1
WRITE_META = true

[getkey3]
REGEX =(?m)source=([\W\w]+)
DEST_KEY = MetaData:Source
FORMAT = source::$1

[clean]
REGEX =(?m)^(.*)key1=.*$
FORMAT = $1
DEST_KEY =_raw

example.log ----------------------------------

20/02/2010 this is a sample event nr 1
key1="123" key2="1234" source=sap
20/02/2010 this is a sample event nr 2
key1="123" key2="1234" source=sap
20/02/2010 this is a sample event nr 3
some more lines 1
some more lines 2
key1="123" key2="1234" source=sap
20/02/2010 this is a sample event nr 4
key1="123" key2="1234" source=sap

The extraction works, the cleaning transform doesn't.

I suspect there are problems with newlines terminating _raw. Can you try:

(?m)^(.*)[\r\n]+key1=.*$

or (slightly better):

(?m)^(.*)\v+key1=\V*$

Hi Gerald

We're a bit closer, event 3 has some collateral damage.

splunk search "*"

20/02/2010 this is a sample event nr 4
some more lines 2
20/02/2010 this is a sample event nr 2
20/02/2010 this is a sample event nr 1

Marinus

(?m)^((.*)|(.*[\r\n].*))[\r\n]key1=.*$

Hi Alex

Almost, event 3 is still not complete, but to build a general purpose regex for this case seems quite painful.

splunk search "*"

20/02/2010 this is a sample event nr 4
some more lines 1
some more lines 2
20/02/2010 this is a sample event nr 2
20/02/2010 this is a sample event nr 1

M

Thank Alex and Gerald!

I think this proves that it's possible to implement my scheme.
Here's a general purpose regex for mutiple lines.

REGEX=(?m)^((.*[\r\n]+)+)key1.*

Marinus