Field Definitions and Splunk’s extract Command

The 3.0 version of Splunk has introduced some wonderful new features such as advanced reporting, granular access control and a slew of additional functions to help you search through your IT data. One of these newly released functions is the extract command. This works very nicely with Splunk’s revamped facility to add, view, and access field names. Here is a quick primer on creating field definitions and using the extract command to have those definitions reloaded automatically.

Splunk has always done a great job at allowing you to search on any text from any data source. Splunk even goes one step beyond this and automatically defines named fields data that shows up in a Keyword = Value (KV) pair. If my data contains text that looks like


then Splunk will key in on those values, allowing me to search and report more precisely on those values. For instance I could say

* | where username <> "sparky"

to get back all of the records where sparky did not show up as a username.

But what if my data is not so friendly? Consider an event that looks like this:

Invalid login attempt by sparky on host kinja

While the data is all there and searchable, there is no easy way to hone in on the fact that sparky is the username. Of course, I could simply include (or exclude) all events that had the term sparky with the search:

* NOT sparky

but lets say I wanted to be more specific. I don’t want to exclude an those events like:

Invalid login attempt by badguy on host sparky

Fortunately Splunk allows me to define fields so I can specify exactly what data is exposed.

There is a full write up on extracting additional fields here but in short, I need to configure Splunk with some hints on how to find that username, and what to call it when I do find it. And I will probably want to do this all within a Splunk bundle to keep things portable and maintainable, but that’s another blog entry.

The first step will be to define a regular expression that will isolate the username in the event. We could set up this definition in our bundle’s transforms.conf file:

REGEX = bys(w+)son
FORMAT = username::$1

Secondly, we will need Splunk to apply this regular expression on the events of a particular sourcetype. We’ll do this at searchtime to allow the definition of these extracted fields to be dynamic. This is accomplished by adding a line to the props.conf file that defines the sourcetype of our events:

REPORT-secure = get-username

Last, but not least, we need define which of our inputs will be using this sourcetype. For simplicity, let’s look at an example of a tailed file with a hardcoded sourcetype. This definition will exist in our inputs.conf file.

sourcetype = securitylog

Now that all the heavy lifting is done, we need to apply these properties to the running Splunk instance. This (finally) is where extract comes in.

Extract allows us to test the regular expression that we have defined within transforms.conf. More importantly, it lets us reload the props and transform without restarting the server. We accomplish this by including the extract command inside of a Splunk search. For example:

sourcetype::securitylog | extract reload=T

Now I should see username listed under the "Fields" tab of my Splunk screen. Make sure that the core only option is unchecked to see the custom defined fields.

There you have it -- a quick into to field definitions and the extract command. Check out the release notes to view all of the new Splunk features.

Bob Fox

Posted by