An easy way to generate sample data

Have you ever had a Splunk project that required a data feed, but for whatever reason it wasn’t practical to tap into the source itself?  Examples of this could be;

You are working on a PoC and need to fiddle with your indexing or timestamps and you simply don’t want to keep re-indexing your original content. Perhaps you have a great use case for Splunk, but you need to have a working application in order to justify a larger volume, but the data source is of such volume and velocity it could violate your license.  Perhaps (as I’ve encountered), you need to work with a production dataset, but can’t get an active input from the production environment until your Splunk App is ready to go into production (catch-22 anyone?).  Maybe you’re working on creating automation or workflow around a specific event or series of events that don’t occur that often, and you would like to test them today instead of waiting for a blue moon.

Enter the Splunk SA-Eventgen.  I find this tool to be incredibly useful, and it is my intention to provide a walkthrough and a few posts on some of my experiences with it.  With that said, I want to give a very big ‘thank you’ to the two very talented Splunkers that developed the app, David Hazekamp and Clint Sharp.

OBLIGATORY NOTICE: This is also my opportunity to say that this is a tool, and is 100% UNSUPPORTED.  If you run into issues with this, please, please, please DO NOT contact Splunk Support or these individuals.

If you would like to get started using it, follow this link.  Clint has been kind enough to record a very thorough walkthrough of how to get your event gen up and running in just a few moments, but we’ll supply some more details and an overall outline of the application in subsequent posts.

For my first example, I will be using a simple data set (see below).  The event generator works in one of two ways; it can be used to either ‘replay’ the events within a file or series of files, or it can be used to randomly extract entries within the file and generate them at semi-random intervals, with particular fields or values changed per your specification.  In this instance, I will be generating a copy every minute of the three events within a sample file, with entries in real-time.  We will also be writing these events out to the /tmp directory.

Example Data will be;

07/31/2013 15:38:18, field1=v1 field2=v2 field3=v3
07/31/2013 15:38:48, field1=v4 field2=v5 field3=v6
07/31/2013 15:39:18, field1=v7 field2=v8 field3=v9

So to start, we need to get the code.

Unzip and move the payload into your $SPLUNK_HOME/etc/apps directory.   I would also recommend that you name the folder ‘SA-EventGen’.  This app requires you to restart Splunk, but hold off on this for now.

Following this, create a new App for testing purposes if you do not have one already created.  When setting permissions for this app, it will need to be accessible by all of the other apps.

You will also want to create a $YOURAPP/samples directory.  Place a sample of the data you want the event generator to work with in this directory.

Finally, we will need to go into $YOURAPP/local directory.  Here we will either need to create a ‘eventgen.conf’ file, or we will need to modify one we have borrowed from the SA-Eventgen app (SA-EventGen/README/eventgen.conf.example).

Your file should look similar to this;



Within this file, the first stanza refers to the file you originally placed in your /samples directory.


Following this, we have a lot of parameters to choose from.  For now, we’re only concerned with the following;

mode = sample
outputMode = file
fileName = /tmp/mysample.log

For more options, see the file ‘SA-Eventgen/README/eventgen.conf.spec’.

Next, we need to change the timestamp within the sample file, this is simply a matter of a regex pattern (notice that there are no capture groups).

## Example TimeStamp 04/19/2013 12:46:10
token.0.token = \d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2},
token.0.replacementType = timestamp
token.0.replacement = %m/%d/%Y %H:%M:%S,

The token simply refers to the regular expressions to capture within the dataset, replacementType refers to the how the replacement should function, and replacement refers to the output (in this instance, we are using strftime formatting).

So in a nutshell we have;

  1. Installed the SA-Eventgen App
  2. Created a sample directory within the app that requires the new data
  3. Placed a data sample within this directory
  4. Created the eventgen.conf that will reference this file

That is it! Restart Splunk and you should see the new file in your /tmp directory.  Note that you will need to add the input and manage the sourcetype, just as you would any other Splunk input.

Next Post: Creating random field values and random input with the event gen

Dennis Bourg

Posted by