This blog post is part 26 of the “Hunting with Splunk: The Basics” series, which takes a single Splunk search command or hunting concept and breaks it down to its basic parts.
If you’re like me, you’ve occasionally found yourself staring at the Splunk search bar trying to decide how best to analyze a series of data, iterating against one or more fields.
If your brain gravitates towards traditional programming syntax, the first thing that pops into your mind may be application of a for or while loop (neither of which follow Turing convention in SPL). With commands like stats, streamstats, eventstats, or foreach at your disposal, which one should a hunter use?
Well, it depends on the data and the required outcome. For example, let’s say we want to calculate the total distance travelled by a salesperson or an escaped toad. The data may contain waypoint information that requires iterative calculation, such as latitude and longitude (or, in some cases, this enrichment may be extracted from the source data, such as with the iplocation command).
Enter autoregress. Sounds fancy. But here’s the thing, the autoregression command is used to calculate a moving average. Here is a link to the Splunk docs description of the autoregress command. Go ahead and check it out, we’ll wait.
Finished? Awesome. Let’s talk about practical applications.
Because the autoregress command is a centralized streaming command, it applies a transformation to each event returned by a search and only works on the search head.
You might be saying to yourself, “Self, I’ve never heard of this command before.” Well, you’re not alone. It’s not new, but not particularly well known. Kyle Smith of Aplura, LLC, included autoregress in his .conf2016 talk, “Lesser Known Search Commands”. Unlike iterative commands, such as map or foreach, the autoregress command is a statistical command (in the same family as the widely used stats and tstats commands).
Kyle expands on the definition as “a Moving Average is a succession of averages calculated from successive events (typically of constant size and overlapping) of a series of values“ and notes the following:
- Allows advanced statistical calculations based on previous values
- Moving Averages of numerical fields
- Network bandwidth trending - kbps, latency, duration of connections
- Web Analytics Trending - number of visits, duration of visits, average download size
- Malicious Traffic Trending - excessive connection failures
Let’s say we’re planning a road trip to visit some of the top craft breweries in the Mid Atlantic United States, and fed that data into Splunk. We want to compute the distance between waypoints and the total distance we’re traveling (so we know how much fuel to put into our personal jetpack). We apply autoregress to both latitude and longitude in order to iterate through the waypoints, then perform any further applicable calculations, such as `globedistance()` or streamstats.
Once you’ve pulled the relevant fields, your command may look something like this:
… | autoregress lat as prev_lat | autoregress lon as prev_lon | `globedistance(lat,lon,prev_lat,prev_lon,units)` | streamstats sum(distance) AS totaldistance
Here’s an example:
As shown above, the autoregress command may help you gather the information where commands like stats, streamstats, eventstats, or foreach alone aren’t necessarily suitable. If you’re like me, you should have no regrets adding the autoregress command to your SPL utility belt.
Follow all the conversations coming out of #splunkconf21!