.CONF & SPLUNKLIVE!

Splunk Book Excerpt: Finding Metrics That Fell by 10% in an Hour

EXCERPT FROM “EXPLORING SPLUNK: SEARCH PROCESSING LANGUAGE (SPL) PRIMER AND COOKBOOK”. Kindle/iPad/PDF available for free, and hardcopy available for purchase at Amazon.

Problem

You want to know about metrics that have dropped by 10% in the last hour. This could mean fewer customers, fewer web page views, fewer data packets, and the like.

page91image14920
page91image15192
page91image15464
page91image15736
page91image16008

Solution

To see a drop over the past hour, we’ll need to look at results for at least the past two hours. We’ll look at two hours of events, calculate a separate metric for each hour, and then determine how much the metric has changed between those two hours. The metric we’re looking at is the count of the number of events between two hours ago and the last hour. This search compares the count by host of the previous hour with the current hour and filters those where the count dropped by more than 10%:

          earliest=-2h@h latest=@h
          | stats count by date_hour,host
          | stats first(count) as previous, last(count) as current by host
          | where current/previous < 0.9

The first condition (earliest=-2h@h latest=@h) retrieves two hours worth of data, snapping to hour boundaries (e.g., 2-4pm, not 2:01-4:01pm). We then get a count of the number of those events per hour and host. Because there are only two date_hour values (two hours ago and one hour ago), stats first(count) returns the count from two hours ago and last(count) returns the count from one hour ago. The where clause returns only those events where the current hour’s count is less than 90% of the previous hour’s count (which shows that the percentage dropped 10%).

As an exercise for you, think about what will go wrong with this search when the time span crosses midnight. Do you see how to correct it by adding first(_time) to the first stats command and sorting by that new value?

Variations

Instead of the number of events, use a different metric, such as the average delay or minimum bytes per second, and consider different time ranges, such as day over day.

----------------------------------------------------
Thanks!
David Carasso

Splunk
Posted by

Splunk

Join the Discussion