Splunking the most abundant time based dataset on the planet

What is it the most abundant time-based data set that *everyone* works with?

It ain’t logs – Its email.

if you think about it, email messages are a bit “event like” – they have timestamp, somewhat structured header, and payload.

Since splunk was designed for time based datasets it’s only natural that we hook it up to email. I’m not suggesting that you use splunk as your mail reader ( although i’m working on a few actions for forward, reply, etc ) but that in a datacenter, email often carries critical workflow information.

In our own infrastructure we have systems generating email notifications for things like support cases, changes to source code, open bugs, etc. Its interesting to bring the mail into the mix with my logs, config changes, etc. Once my mail is indexed I can instantly report on frequency of customer issues (support case email), changes in source code by file/user (perforce checkin/diff email), coded bugs by user per week (Jira bug notification emails), or just report on my own inbox – messages by size by time/sender/etc.

Its pretty cool if nothing else just to look at my inbox via splunk – not that i’d switch off of Thunderbird but it would make for a nice tool to use along side ( hint – working on thunderbird / splunk extension 😉 For example, here is the top of the page for a search for just index::mail which amounts to all mail last 7 days.


If i wanted to see all email just from folks here at splunk i could do the following search index::mail | regex From =


More interesting are things that involve the content. For example, we use a perforce script to send email for each checkin to our repository. These emails include a ton of information including the name of developer that made the change, list of files changed, branch, diffs, etc. Suppose i wanted to see all the changes made to a particular file in our repository – i can easily just add the file name to the search.
index::mail p4review str.c_str
You can quickly see the file was changed 5 times, who made the change, and what are the diffs! – and its trivial to add any other term 😉


Perhaps more valuable or at least more intesting is the adhoc reporting on these emails.
Taking a trival search such as who are my top senders over the last 7 days.
index::mail | top From


Or more cool, going back to he perforce search i can easily show me the distribution of check-ins by developer:


Too Cool !! And its just two words and two clicks!
If i wanted to i could have added filename, or even class name, method name, etc.

All of the above is done with the generic free product and i get instant reporsitory reporting.
Since our bug tracking ( Jira ) as well as our CRM ( sugar ) also send email with such content i can trivially report on them as well.

  • see distribution of customer support cases
  • see distribution of cases by feature
  • see timeline of bugs submitted by project
  • see fix rate of bugs by developer

Just to name a few…..

So come on – its too easy to setup and install not to try!!
Takes 5 minutes and well, its free so go for it – if you dont already have splunk get it first. Given the volume of normal users email you most certainly can use the free version.

After you have downloaded Splunk –

      1. Download the ( you may need to right click and choose save linked file ) imap bundle
      2. untar bundle into $SPLUNK_HOME/etc/bundles – $SPLUNK_HOME is the path to where you installed splunk
      3. Edit $SPLUNK_HOME/etc/bundles/imapbundle/inputs.conf
      3.1 – change disabled=true to disabled=false. The bundle is shipped disabled just in case
      3.2 – change your mail specific values

      ** Required Inputs **

        --servername = name or ip of imap server
        --usename = account name

      One of the following, password or xpassord, is used for authenitication with your mail server. Supply one of the following arguments

        --password = password clear text


        --xpassword = encrypted password
          Run ./ in bundle bin directory
          Its important that the key file is generated in the bin direcory. If you call the genpass script from another dir it will drop the key file there in which case you must move it to the bin dir.
          Copy encrypted key from output of ./ into xpassword field

      ** Optional **

        --port = port on imap server ( default is 143 )
        --folders = coma separated list of folders to download mail from ( default is INBOX )
          example: --folders=", INBOX.sent, INBOX.bugs" – remember to put in quotes if you uses spaces after coma
        5. Restart your sever – although i’d try out the next section on TESTING first since its easier to test outside of splunk.


    To Test that its configure properly we can run the command by hand and make sure we see stuff coming back.
    1) copy the entire script:// line in etc/imap/inputs.conf ( this command line along with the args are the executable that we are going to test)
    2) cd imap/bin
    3) paste the script line
    4) remove the head [script::/.bin so that it looks like ./ --user=bobsmith etc...
    5) you will need the splunk enviroment set prior to running the script. To do this:
    source YOUR_PATH_TO_SPLUNK/bin/
    6) try and run the command

    It should spew back stuff.
    Most common error is with server/username or password. if you have encrypted your password and its somehow wrong it should be obvious.

    When done testing the above and it seems to work, then copy the cmd back into your inputs.conf file

    Let me know if you see other errors – i have not put that much debugging code in yet.

    Splunk will write to its own logs when it calls the imapbundle.
    If your not getting mail indexed then use the following search.
    index::_internal imaplaunch
    If no events come back than its either the bundle is not installed in the proper location ( $SPLUNK_HOME/etc/bundles ) or the bundles is disabled and you need to enbale it in $SPLUNK_HOME/etc/bundles/imapbundle/inputs.conf

    Other errors such as auth and server not found errors should be obvious. Again, test on the command line as explained above.

    If its working you should see events that end in “reading to EOL

    See end of post for entire read me options.

    The bundle as i have configured it will put everything in an index::mail. If you want to put it somewhere else change the index= entry in inputs.conf and remove indexes.conf if you not using it. If you do keep the configuration for the separate index then at time you restart splunk you will get a warning that its creating the new “mail” index.

    As always, this is perty raw and and there are many of the rough edges.
    Email me or comment here for suggestions.


Posted by