Alert Throttling

NOTE: in 4.2 (released today 3/15/2011) alert suppression/throttling is supported natively by Splunk

Most splunk users soon realize that splunk ships with a scheduler which can be used to run searches periodically and execute some actions (send an email, generate an rss feed , call a script etc) when the results of the search meet some condition. Soon after discovering this feature many users proceed to looking for some mechanism to throttle the alerts issued by splunk.  For example, a common use pattern for alerts is:  check the health of a resource every 5 minutes and send an email alert when the resource is unhealthy, but only send out emails at most every hour.  As of the most recent release (4.1.2) splunk does not provide an out of the box way for throttling alerts.

This post introduces an app that provides the ability to throttle alerts that I recently shared in  splunkbase, AlertThrottle. In order for you to be able to use this app you should be familiar with “Advanced Alert Conditions” concept (there is nothing advanced about these conditions if you know how to write a splunk search). The one thing you need to remember about advanced alert conditions is: if the condition search generates at least one result the scheduler triggers the alerts, if the condition search yields no results the alerts are not triggered.

AlertThrottle App

The app is pretty simple, it simply contains a custom search command called, you guessed right, throttle. This search command takes in two required arguments (1) a throttle name and (2) a throttling period, in seconds. The throttle search command allows search results to pass through only once during the specified period for the given throttle name. Thus we can combine Advanced Alert Conditions and the new throttle command provided by the AlertThrottle app to achieve alert throttling. Here is a concrete example:

Imagine running the following search at 10:00 am

search error | stats count BY error_type | throttle name=”my_errors” period=3600

The use of throttle command in the search will prevent this search (or any other search that uses the same throttle name and period) from returning any results until 11:00 am

Now you have enough knowledge about this app to download it and start working through an example. Please follow the example below after having installed the app

Let’s assume you want to be alerted for splunkd errors soon after they happen (at most 5 minutes), but you only want to be notified at most once every hour. So we proceed by:

  1. defining the scheduled search as: index=_internal error
  2. choosing the search’s execution time range: -5m
  3. choosing  the execution period: every 5 minutes
  4. entering the following custom alert condition:  stats count| search count>0 | throttle name=splunkd_errors period=3600

That’s it. The scheduler will run the search every five minutes evaluate the condition and only allow results to pass through every one hour –  thus triggering the actions at most every hour.

Note: if you want to throttle multiple scheduled searches such that only one of them triggers the alert per throttle period, use the same throttle name.

NOTE: in 4.2 (released today 3/15/2011) alert suppression/throttling is supported natively by Splunk

Ledion Bitincka

Posted by