Enriching Data with Lookups (Part 1)

Many customers tell me that they see a lot of value when Splunk is used to enrich IT data with information from another source. An example of such an enrichment could be a cross reference between a customer’s username found in an application log and that same customer’s information extracted from a contact management system. How amazing would it be to have a customer service representative make a phone call to Mr. Smith to ask if he needed help logging onto their system after a number of failed logins?

Splunk has always been able to do data enrichment, but the newly released Splunk 4 really simplifies the process. In this post, I’ll give a quick examply of using a CSV file to provide data enrichment to a application log. In future posts, I’ll show how to use an external database as the data source.

Let’s start with some mock application data. To keep things simple, we’ll use this as our application log:
Jul 27 08:35:09 appname=app4 error=123
Jul 27 08:35:19 appname=app3 error=123
Jul 27 08:35:29 appname=app1 error=163
Jul 27 08:35:39 appname=app1 error=123
Jul 27 08:35:49 appname=app1 error=133
Jul 27 08:35:59 appname=app1 error=123
Jul 27 08:36:09 appname=app1 error=123

The goal here will be to enrich this data with the actual error message, rather than just the error number. To facilitate this, we will use an error lookup table in the form of a comma separated variable (CSV) file with a descriptive header:
error, error_message
113, Error In WOPR Core
123, General Application Fault
133, Memory Allocation Error
163, Error Exists Behind Keyboard

First, we need to do a little preparation. I will make an assumption here that we have gone ahead and defined a Splunk App that tells Splunk where to find (and what to do with) the application log. We are going to need to create a directory within the app definition to support the lookups. This directory will typically be $SPLUNK_HOME/etc/apps/APPNAME/lookups where $SPLUNK_HOME is the top level directory where Splunk is installed, and APPNAME is the name of the Splunk App. Inside this lookups directory we’ll put the CSV file above. You can call this whatever you want, and for this example we will call it errortable.csv. It’s important to note here that the CSV file will need to be located in (or linked into) $SPLUNK_HOME/etc/apps/APPNAME/lookups or $SPLUNK_HOME/etc/system/lookups.

Next we’ll make a couple of quick file changes. Typically, all of these files will be in $SPLUNK_HOME/etc/apps/APPNAME/local. I’ll assume that there is already an inputs.conf file in that directory. Depending on the configuration, there may or may not be a transforms.conf file. If not, we will create it and add a definition of where to find the CSV file. Inside transforms.conf add:
filename = errortable.csv

I have chosen to call the transform.conf entry ErrorLookup. You can call this whatever you like, as long as it matches the entry in props.conf, below. We’ll give ErrorLookup a single entry to map to the CSV filename.

Now, we need to make another change within the directory. This time , we will modify props.conf to alter the sourcetype definition for the data we will be enriching. Now, it is possible that the sourcetype for your application logs is defined elsewhere, in which case our configuration may become a bit more tricky — but certainly not impossible. We are going to assume simplicity here. Edit (or create) props.conf, and find (or create) a stanza that matches the sourcetype of the application log. In the case of my example, the sourcetype was myappdata. For this example, I always want my application events to show both the error number, and the error message. My sourcetype definition in props.conf should look like this:
lookup_table = ErrorLookup error OUTPUT error_message

There may be other information already in the myappdata definition. The lookup_table line added here breaks down like this:
  • lookup_ specifies that we are doing the lookup function here at search time.
  • table is a class. For the most part, this is an arbitrary value and can be anything you want.
  • ErrorLookup refers to the entry we made in transforms.conf
  • error refers to both the Splunk field, and the CSV header. We named them the same. If they were different, one could always use an AS command here (ie csv_field AS splunk_field)
  • OUTPUT defines what is going to end up as a field back in our event. If you don’t use OUTPUT, all columns in the CSV file will be brought in as Splunk fields.
  • error_message here defines both the CSV column (as defined by the CSV header) and the Splunk field that will be created. Again, use the AS command if you want to rename the field on the fly.
Once these configs are in place, give Splunk a restart and do a search on the sourcetype. Assuming all of the definition names match up and the CSV file can be found, you should see the additional field(s) in the ‘Other interesting fields’ section of the Field Picker.

Up next: Data enrichment using a script to an external source…


Bob Fox

Posted by