TIPS & TRICKS

An easy way to generate sample data – Part 3

In my last two posts (Part 1, Part 2) we discussed using the splunk eventgen to create a replay of a data sample.  In the first post, we configured a data sample to replay it’s events into a log file, and in the second, we started to introduce the creation of random data within the events.  In this post, we’re going to look at sampling events randomly from the data set.  This is easily accomplished via altering the mode argument:

mode = sample

That is all that is required! But we can do a bit better than that. What if we not only wanted to alter the variability of the events, but what if we could also alter the velocity of our events as well.  This can be quite helpful in directing event trends to reflect natural usage patterns or to isolate particular time frames.  One easy example to consider would be events from logs supporting online commerce, these would naturally fluctuate in velocity as people access your site during normal waking hours.  So how can we set this up with the eventgen?  First, we will need to incorporate the ‘mode’ setting as mentioned above (changing the rates of events will not work in ‘replay’ mode).

blogimg-eventgen-random

[mySampleSourceType.log]
outputMode = file
fileName = /tmp/new_seed.log
disabled = false
interval = 3
earliest = -3s
latest = now
count = 3
spoolDir = $SPLUNK_HOME/etc/apps/oidemo/spool
hourOfDayRate = { "0": 0.30, "1": 0.20, "2": 0.20, "3": 0.20, "4": 0.20, "5": 0.25, "6": 0.35, "7": 0.50, "8": 0.60, "9": 0.65, "10": 0.70, "11": 0.75, "12": 0.77, "13": 0.80, "14": 0.82, "15": 0.85, "16": 0.87, "17": 0.90, "18": 0.95, "19": 1.0, "20": 0.85, "21": 0.70, "22": 0.60, "23": 0.45 }
dayOfWeekRate = { "0": 0.97, "1": 0.95, "2": 0.90, "3": 0.97, "4": 1.0, "5": 0.99, "6": 0.55 }
minuteOfHourRate = { "0": 1, "1": 1, "2": 1, "3": 1, "4": 1, "5": 1, "6": 1, "7": 1, "8": 1, "9": 1, "10": 1, "11": 1, "12": 1, "13": 1, "14": 1, "15": 1, "16": 1, "17": 1, "18": 1, "19": 1, "20": 1, "21": 1, "22": 1, "23": 1, "24": 1, "25": 1, "26": 1, "27": 1, "28": 1, "29": 1, "30": 1, "31": 1, "32": 1, "33": 1, "34": 1, "35": 1, "36": 0.1, "37": 0.1, "38": 1, "39": 1, "40": 1, "41": 1, "42": 1, "43": 1, "44": 1, "45": 1, "46": 1, "47": 1, "48": 1, "49": 1, "50": 1, "51": 1, "52": 1, "53": 1, "54": 1, "55": 1, "56": 1, "57": 1, "58": 1, "59": 1 }
randomizeCount = 0.33
randomizeEvents = true

 

The settings that are doing all of the work;

randomizeCount = <float>

  • Will randomize the number of events generated by percentage passed
  • Example values: 0.2, 0.5
  • Recommend passing 0.2 to give 20% randomization either way (plus or minus)

randomizeEvents = <boolean>

  • Will randomize the events found in the sample file before choosing the events.
  • NOT SUPPORTED WITH sampletype csv

interval = <integer>

  • Only valid in mode = sample
  • How often to generate sample (in seconds).
  • 0 means disabled.
  • Defaults to 60 seconds.

hourOfDayRate = <json>

  • Takes a JSON hash of 24 hours with float values to rate limit how many events we should see in a given hour.
  • Sample JSON:
    { “0”: 0.05, “1”: 0.05: “2”: 0.07… }
  • If a match is not found, will default to count events
  • Also multiplied times dayOfWeekRate, minuteOfHourRate

minuteOfHourRate = <json>
dayOfWeekRate = <json>*

*For full examples of time modifiers, please read the eventgen.conf.spec.

 

Thats it!  We can easily harness the output velocity of our sample data to reflect real world behavior (e.g. security attacks at 3 a.m. or shopping/purchasing patterns as holidays approach).  Good luck and Happy Splunking!

Dennis Bourg
Posted by

Dennis Bourg

Join the Discussion