Making reports faster by caching scheduled searches

I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.

If I have a search/report that I want to run faster, I will save that search and have splunk run it over a small timeframe (5,15,30,60 min) taking the results of that search/report and feeding them back into an index i create to hold cached results.

For example, suppose I like to run nightly reports where I show “top users by bandwidth”. Its easy enough to run the report every night, but suppose there are times during the day when I want incrementals, or I want to look at last week, or perhaps get dailies over a month. Every time I run the search/report I need to search and recalculate “top users by bandwidth”, which if over billions of events can take time 😉

Instead, I’ll just save the search/report and have Splunk run it every 15 minutes with the results being sent to a “cache” index. This way if I ever want to do an adhoc search on “top users” or if I want to do “weekly reports by day” all the data is precalculated.

Think of this as creating “logs” that are the output of a search/report and then having Splunk index those “logs”. To get fast results you can then search/report on the summarized cached data.

If not obvious why it’s faster, suppose you are indexing 500M events a day and 100M of those have bandwidth data. To report on “top bandwidth by users” I need to run a search to get the 100M events then run the report across all 100M.
If instead I were in the background running that same search/report over each hour interval, then saving the data back into splunk, I would reduce the data i’m operating on from 100M down to 1200 ( 24*500 ) (assuming that i’m getting top 500). Doing searches/reports on the later dataset are sub second versus the few minutes it would take to run across the 100M.

Make sense ? – its really simple but odd to explain.

PART ONE – Setup:

  • 1. Grab the reportcache search script from “** here ** and put it in your SPLUNK_HOME/etc/searchscripts directory – no need to restart you can now cache any search/report data.
  • 2. Add a cache index – either add the following to your etc/bundles/local/indexes.conf or create a new bundle and add to that indexes.conf You will need to restart splunk after adding the index.

    homePath = $SPLUNK_DB/cache/db
    coldPath = $SPLUNK_DB/cache/colddb
    thawedPath = $SPLUNK_DB/cache/thaweddb

PART TWO – Testing by writing to a file:

I recommend that you first test reportcache by having it output to a file that you scan to make sure things look right.

  • 1. Find a search you want to cache. Simple candidate is something like the following report against the internal index that shows queue sizes by queue name.
    index=_internal metrics "group=queue" timechart avg(current_size) by name
  • 2. Once you have a search you want to cache – add the following "reportcache index=cache path=/tmp file=testcache.log notimestamp" command to the end. The following assumes you have made an index named “cache”. The index attribute is required and you should not use your default unless you know what your doing. Also we are going to output the file to /tmp/testcache.log using the file and path attributes. The notimestamp option simply suppresses adding a timestamp to the filename.
    index=_internal metrics "group=queue" | timechart avg(current_size) by name | reportcache index=cache path=/tmp file=testcache.log notimestamp
  • 3. Run the search and you should get back the normal search results and not see an error on the screen. If you do see an error it should be self explanatory.
  • 4. Open the file /tmp/testcache.log and make sure the results look ok. They should look like a bunch of lines key=value, key=value

PART THREE – Writing to an index:

  • 1. We are now going to have the command put the results into the index. Simply remove the file, path and notimestamp attributes
    index=_internal metrics "group=queue" | timechart avg(current_size) by name | reportcache index=cache
  • 2. Run the command – you should again see normal results and no error.
  • 4. Wait 30 seconds or so…
  • 5. Run the following search to make sure results made it into the cache index – you should see your cache data after this search
  • 6. Now click on the report link and see if you can get your report back 😉 This part is the somewhat odd part. All the fields should be as they were in the original search but many reports create keys with odd names. The best thing to do is to click around and see what reports you can make. You should be able to get back to the original search/report prior to the caching.

PART FOUR – Enabling automatic caching:

After you have found and tested a search/report you want to cache moving forward:

  • 1. Save the search along with the reportcache command
  • 2. Schedule the saved search on a small time frame ( 5, 15, 30, etc ) minutes
  • 3. Test by waiting a few hours and looking at the results in the cache index.

There is a good chance that either the above description was vague or that there is a bug / edge-case that i did not consider.
One frequent problem i have seen is trying to cache data that has no timestamp. For example,
somesearch | top users
will produce restults without timestamps. This makes a mess of the cached data. If you have this problem then try rewriting your search to something like:
somesearch | stats count first(_time) by users | where users != "" | sort -count
The above will produce data that has both top and timestamps.

Few other things that are common requests:
Often folks want to go back in time and create cached results for prior data. I have a script that can do that and will post it after more testing.
Another common topic of conversation surrounds the over creation of summary data. In many cases it can be benificial to cache more stuff than you initally need in case you want to run reports later. I’m trying to think of good ways to automatically do this for you.

** IMPORTANT ** – drop me a line and let me know how something like this *should* work. I suspect that we will add a “checkbox” to saved searches that will automatically do the right thing.

I’ll leave this post wit the usage info from the top of the search script.

# usage: | reportcache
# file=[filename] – default is current time
# path=[path] – default is $SPLUNK_HOME/var/spool/splunk
# index=[indexname] – which index to target for results. If blank will use whatever is bundled
# marker=[string] – this is just a token or k=v used to mark the results for version or other delination or to defeat crc caching
# format=[“csv”|”splunk”] – use the output format “splunk” for feedking back into splunk or csv if you want to save for other tool
# appendtime – if true this will append current time. Its useful when you are doing something that you want with a timestamp of now
# notimestampe – is this arg is supplied it will suppress the timestamp in the filename
# debug – if debug then will just out args to screen

# following example will put in var/spool/splunk a file named foo without timestamp, marked with erik=nextrun”, and targed to index cache
# index::_internal “group=pipeline” | timechart avg(executes) | cacher file=”foo” notimestamp marker=”erik=nextrun” index=”cache”

Posted by