TIPS & TRICKS

Removing Duplicate Consecutive Events

EXCERPT FROM “EXPLORING SPLUNK: SEARCH PROCESSING LANGUAGE (SPL) PRIMER AND COOKBOOK”. Kindle/iPad/PDF available for free, and hardcopy available for purchase at Amazon.

Problem

You want to group all events with repeated occurrences of a value in order to remove noise from reports and alerts.

Solution

Suppose you have events as follows:

          2012-07-22 11:45:23 code=239
          2012-07-22 11:45:25 code=773
          2012-07-22 11:45:26 code=-1
          2012-07-22 11:45:27 code=-1
          2012-07-22 11:45:28 code=-1
          2012-07-22 11:45:29 code=292
          2012-07-22 11:45:30 code=292
          2012-07-22 11:45:32 code=-1
          2012-07-22 11:45:33 code=444
          2012-07-22 11:45:35 code=-1
          2012-07-22 11:45:36 code=-1

Your goal is to get 7 events, one for each of the code values in a row: 239, 773, -1, 292, -1, 444, -1. You might be tempted to use the transaction command as follows:

          ... | transaction code

Using transaction here is a case of applying the wrong tool for the job. As long as we don’t really care about the number of repeated runs of duplicates, the more straightforward approach is to use dedup, which removes duplicates. By default, dedup will remove all duplicate events (where an event is a duplicate if it has the same values for the specified fields). But that’s not what we want; we want to remove duplicates that appear in a cluster. To do this, dedup has a consecutive=true option that tells it to remove only duplicates that are consecutive.

          ... | dedup code consecutive=true

----------------------------------------------------
Thanks!
David Carasso

Splunk
Posted by

Splunk