
EXCERPT FROM “EXPLORING SPLUNK: SEARCH PROCESSING LANGUAGE (SPL) PRIMER AND COOKBOOK”. Kindle/iPad/PDF available for free, and hardcopy available for purchase at Amazon.
Problem
You want to group all events with repeated occurrences of a value in order to remove noise from reports and alerts.
Solution
Suppose you have events as follows:
2012-07-22 11:45:23 code=239 2012-07-22 11:45:25 code=773 2012-07-22 11:45:26 code=-1 2012-07-22 11:45:27 code=-1 2012-07-22 11:45:28 code=-1 2012-07-22 11:45:29 code=292 2012-07-22 11:45:30 code=292 2012-07-22 11:45:32 code=-1 2012-07-22 11:45:33 code=444 2012-07-22 11:45:35 code=-1 2012-07-22 11:45:36 code=-1
Your goal is to get 7 events, one for each of the code values in a row: 239, 773, -1, 292, -1, 444, -1. You might be tempted to use the transaction command as follows:
... | transaction code
Using transaction here is a case of applying the wrong tool for the job. As long as we don’t really care about the number of repeated runs of duplicates, the more straightforward approach is to use dedup, which removes duplicates. By default, dedup will remove all duplicate events (where an event is a duplicate if it has the same values for the specified fields). But that’s not what we want; we want to remove duplicates that appear in a cluster. To do this, dedup has a consecutive=true option that tells it to remove only duplicates that are consecutive.
... | dedup code consecutive=true
----------------------------------------------------
Thanks!
David Carasso