Schedule Windows vs. Skewing

Splunk Enterprise 6.3 added the feature of Schedule Windows that allows the Search Scheduler to distinguish between searches that really should run at a specific time (just like cron) from those that don't have to, thereby greatly reducing lag or skipping. Splunk Enterprise 6.6 adds Schedule Skewing that allows the Search Scheduler to randomly distribute scheduled searches more evenly over their periods.

What’s the practical difference? When should you use one vs. the other? I’ll explain. But first, I’ll review each feature separately.

Scheduler Terms

There are a few terms used when discussing the Search Scheduler:

Schedule Windows

Background As mentioned, schedule windows allow the Search Scheduler to distinguish between searches that really should run at a specific time (e.g. every hour as close to the top of the hour as possible) from those that don't have to (e.g. approximately every hour, but when specifically within the hour is not critical). Hence, giving a search a window is altruistic: it helps other searches. In savedsearches.conf, the parameter is specified as:

schedule_window = window-in-minutes | auto

where window-in-minutes when greater than 0 indicates the specific window of time during which the search will be altruistic—i.e. have a priority score higher (worse), and allow other searches to run first. (However, if at any time there is sufficient capacity to run the search, it will be run). If the search instance hasn’t run and the window expires, then the scheduler will treat the search instance from that point on as if it never had a schedule window (until it either finally runs or is skipped).

The auto value tells the scheduler to calculate the window of time automatically based on historical run-times of the search. For example, if a search runs every five minutes and has historically taken approximately twenty seconds to run, then—in order to have been run within its five-minute period—the search can be deferred at most four minutes and forty seconds; so that is the auto value.

To illustrate a use-case for schedule windows, suppose you have a mixture of searches: some run frequently—say every 5 minutes or even every minute—and some run less frequently—say once an hour or even once a day. At times, when many of those searches scheduled times align on a Splunk deployment with insufficient capacity to run them all concurrently, those searches with schedule window will allow other more important searches to run first.

Before & After To illustrate the benefit of schedule windows, here are some “before” vs. “after” scheduler performance charts.

Things to notice:

Here is the “after” set of schedule performance charts.

Things to notice:

Schedule Skewing

Background As mentioned, schedule skewing allows the Search Scheduler to randomly “skew” a set of searches’ scheduled times more evenly over their periods. In savedsearches.conf, the parameter is specified as:

allow_skew = percentage% | duration

where:

To illustrate a use-case for skewing, suppose you have very many searches that run for only a few seconds every minute. Despite having very many searches, your Splunk deployment can run all the searches simultaneously. However, the simultaneous network bandwidth used by those searches exceeds the capacity of your switches; and, just a few seconds later when all the searches have completed, the network bandwidth drops back close to zero. Since your Splunk deployment can run all the searches simultaneously, this isn't a problem that scheduler windows can solve. What you want is to spread the dispatching of the searches out over time to decrease the network saturation. This is precisely what skewing does.

A few things to note about skewing are:

Before & After To illustrate the benefit of schedule skewing, here is one “before” vs. “after” scheduler performance chart.

The thing to notice is that the majority of searches are running every minute at the top of the minute saturating the network.

Here is the “after” schedule performance chart.

Splunk schedule skewing after

The thing to notice is that the searches are now much more evenly spread over time thus reducing the network load.

Schedule Windows vs. Skewing

Now that each feature has been explained, when should you use one versus the other?

Want to learn more? Check out the slides and recording of my .conf2017 session "Making the Most of the Splunk Scheduler."

----------------------------------------------------
Thanks!
Paul Lucas

Related Articles

Cyclical Statistical Forecasts and Anomalies - Part 6
Platform
5 Minute Read

Cyclical Statistical Forecasts and Anomalies - Part 6

Identifying anomalies in data is the top use for machine learning in Splunk. Here we will take you through a simple method for how you can detect anomalies on your data using SPL.
Fine-Grained Authorization for Saved Searches
Platform
4 Minute Read

Fine-Grained Authorization for Saved Searches

Splunk is excited to provide fine-grained authorization for Knowledge Objects starting with Saved Searches.
Accelerate Productivity With Updates in Your Platform UI Home Page
Platform
3 Minute Read

Accelerate Productivity With Updates in Your Platform UI Home Page

A run-through of the experiences added to the redesigned home page in the most recent versions of Splunk Cloud Platform and Splunk Enterprise.