An easy way to generate sample data – Part 2

In my previous post I discussed generating data from a sample data set to be replayed.  We discussed altering the timestamp of events (to match the run time of the eventgen), but not much more.

Now lets take a look at adding some randomness to our data.  For random data, lets look at two different sets, a random number within a specific range, and a pre-created value drawn from a sample set of values.  All of these changes can be made in the eventgen.conf file which would be created following the previous post.

Just as with the previous change to the time stamp, we will need to identify the field we wish to change via isolating the value with regular expressions (in this case ‘XXX’).  Assuming that you are changing the timestamp as token.0, we will need to add token.1;





Aside from the regex identifying the value to change, we see two new lines here.  As in the first exercise, we have declared a ‘replacementType’, only instead of ‘timestamp‘ we will use ‘random’.  Below that line, we declare the upper and lower limits of the replacement value, in this case a random number between 1 and 100 will be generated.

This is great when you need to create random numbers, but what about random values that are not integers, maybe a string perhaps?  For this, we need a ‘seed’ or sample file to start with.  You will want to create a directory called ‘samples’ within your app to host this (e.g. $SPLUNK_HOME/etc/apps/MyApp/samples).  Within this file, each line should contain a value of what could be a suitable replacement.  For the purpose of this post, we’ll use a set of sample ‘hosts’ in a file called mySampleHosts.sample.






Now to randomly draw from that file, we’ll follow the same steps as before to isolate the value for replacement with regular expressions.  Below that, we will alter the ‘replacementType’, this time with the value of ‘file‘.  Finally we provide the path to the file which contains the values.




We now have a event generator that is pulling from an existing sample of data (again, more on this in the first post), adding the current timestamp at run time, and then altering two values, one to be a random integer, and another string randomly selected from a sample set.  You could expand on this infinitely to create a brand new sample data set which is completely different than it’s source sample.  This can be very useful for creating sample apps, testing props and transforms without a live feed, testing rules prior to release into production, testing TAs, and much more.  Stay tuned for more!

Dennis Bourg

Posted by