TIPS & TRICKS

Two time-series, One Chart – Part Two

Following up on to my last post about plotting two time-series in one chart, I would like to talk about another related, larger topic; plotting multiple time-series on a single chart using a single search. Take for example the case of measuring and comparing values of a certain metric over multiple time ranges that are not adjacent to each other (as opposed to the last post were both series were adjacent; current hour vs. last hour, today vs. yesterday etc.)

Assume that in this example the metric of interest is average(responseTime) of a particular service that you’re offering. Further, assume that we would like to measure it over the last hour and compare it to the maximum of the average(responseTime) over the same hour of the same day over the last 4 weeks.

Problem Statement: measure average(responseTime) from 3-4pm on Thursday 1/31 and compare it to the average response time during the same hour on 1/24, 1/17, 1/10 and 1/3.

This is a suitable example of measuring an application’s response time and getting alerted on negative differences (ie. worsening response times) and perhaps even comparing it against a known SLA.

On a first look, this requirement may be satisfied by using multiple searches (subsearches) which return the desired metric and then comparisons should be trivial. However, using a single search and putting it on an hourly schedule can be real tricky since you need to account for all the moving parts.

Here’s the search for the impatient (scheduled every 3rd minute of every hour):

index=my_index sourcetype=mytype responseTime=* earliest=-28d@h-1h latest=@h date_wday=`my_day_macro` date_hour=`my_date_macro`
| addinfo
| eval marker=if(_time<info_max_time-3600, "previous", "current")
| stats avg(responseTime) as metric by marker
| stats max(eval(if(marker=="previous", metric, ""))) as resp_threshold, max(eval(if(marker=="current", metric, ""))) as current_resp
| eval diff=current_resp-resp_threshold

Let’s break it down:

  • Base Search: find all events we’re interested in. Use macros to scope the search to the right hourly slots (more on macros below)
  • 1st pipe: add info about the search (all we need here is info_max_time ie. the search’s latest time which in our case it’s snapped to the last hour)
  • 2nd pipe: add a field called marker=current to all events in the last full hour, for all others marker=previous
  • 3rd pipe: calculate avg(responseTime) and split by marker
  • 4th pipe: calculate the maximum of averages when marker is “previous”
  • 5th pipe: calculate the difference
  • Alert: if diff is negative

Macros

Both macros below are eval based and they both return a string (notice the tostring() wrapper) based on a few of evals and time() functions.

#macros.conf
#-----------
[my_date_macro]
definition = tostring((tonumber(strftime(time(), "%H"))-1)%24)
iseval = 1

[my_day_macro]
definition = tostring(lower(if(tonumber(strftime(time(),"%H"))==0, strftime(time()-3600, "%A"), strftime(time(), "%A"))))
iseval = 1

Eval based macros are a pretty powerful concept but when they’re used to scope the time range of a search become even more so. Note that they are expanded before the search is run therefore some field values and some eval functions (such as now()) are not available. Let’s break them down:

my_date_macro:

  • It sets the date_hour field to the previous hour relative to the search. That is, if the wall clock (when the search fires) says 15:03:12 it sets date_hour=14.
  • I use mod24 here because an edge case exists on the first run of the day. This guarantees that the correct hour is set when the search runs on 00:03 (ie. date_hour=23).

my_day_macro:

  • It sets date_wday field to the correct day. The if() function inside accounts for the same edge case above. When the search runs on 00:03 on a Tuesday, we’re actually interested in the previous hour’s (ie. 23:00:00-23:59:59) responseTime at which point is in fact Monday.

You can use the above macros in two places: either in the base search or afterwards (to achieve the necessary filtering). The latter alternative is a horrible option as Splunk will bring in all the events in the last 28d only to filter out the vast majority of them. When I moved the macros outside the base search I measured a 20x slowdown! Therefore, keep the macros in the base search and you’ll have the indexers only scoop up the hour slots of interest and perform all the necessary work on a smaller set of data.

Feel free to modify as necessary but don’t forget to share. If you have any comments fire away.

----------------------------------------------------
Thanks!
Dritan Bitincka

Splunk
Posted by

Splunk