popeness – Splunk’s all you can eat for $5.99

When most folks think of Splunk – they think of our log file search engine (and of course our ad’s staring Mark our honest-to-god support/sysadmin guru and ya the cool teeshirts, etc ).

But, I don’t really use Splunk for logs that much. Don’t get me wrong, logs are useful when indexed, but i like to feed Splunk with lots of other stuff.

In particular, i go after things like email messages ( not the logs, the mail itself ), OS resource info, raw network traffic, and configuration files, to just name a few – so that i can, as we say around the office, “Splunk the Datacenter”.

I find that logs by themselves are useful, but when combined with other information such as historical snapshots of vmstat, iostat, ps, top, etc AND when eating all the configs on the box – then I have everything I need to figure out what is going on.

Simple “hypothetical” case :
In one of my logs yesterday at 14:00 I see a JDBC timeout exception. Then searching for ps and vm output at that same time i see that the app servers memory footprint has gone through the roof. I then search for all config changes on that box starting at 2PM going backwards, and viola! I see that someone changed the number of threads on the appserver 40 minutes earlier and vm steadily climbed until “boom” my JDBC timeout exception.

All this with just a few simple searches/clicks – all within a minute. BTW, since i also eat the Perforce changes ( via email messages that i read via IMAP ) and the configs are pulled from Perforce – i know who made the conf change – busted.

So, how do you get all this “non-log” info into splunk?
In 3.0 we support something called “scripted input”.
At its simplest, Splunk will call a script ( any program ) every (n) seconds and will index whatever is returned in stdout/err. Think of Splunk calling vmstat every 5 seconds and indexing:

Mach Virtual Memory Statistics: (page size of 4096 bytes)
Pages free: 11516.
Pages active: 464198.
Pages inactive: 233414.
Pages wired down: 77304.
"Translation faults": 155220582.
Pages copy-on-write: 856939.
Pages zero filled: 89705321.
Pages reactivated: 6282955.
Pageins: 3095038.
Pageouts: 201406.
Object cache: 131000 hits of 1013577 lookups (12% hit rate)

With this data indexed i can easily now search to find when a specific host’s Pages free: .
Very cool and very useful and brain-dead easy to setup.

Now, vmstat, iostat, top, ps, netstat, losf, and a few others i have all packaged up in a bundle. This makes it easy to deploy on any machine as a lighwight package that will monitor lots of useful information.

But thats not all. I also have bundles for:

  • tcpdump: A bundle/script that allows me to on-the-fly stream tcpdump data into Splunk.
  • webpage/site Monitoring: A simple scripted input that will hit a webpage and index the time, size and optionally md5 hash or even the raw content.
    This is a dirt simple way to monitor a site and even get the like historical tracking of content.
  • imap mail reader: On a schedule will login to an imap server and download new mail. It will index to, from, size, sent time, etc. As well it can index body effectively giving a nice email search interface. Internally we have lots of systems such as our crm/sfa, bug tracking, source control, that all send email. Its cool to hook splunk up to these systems via email.

Over the next few posts i’ll upload each of the above with instructions on how to use.

First, here is the Monitoring Bundle.

Installation is the easy part!
- Download the bundle at -> monitoring.tar
- Untar it into your_splunk_home/etc/bundles
- Restart your server ( I know this sucks, no one wishes more that you did not have to restart when adding a bundle than me )

Here is the contents of the README file included - has a bit more info:

This bundle will one or more system level monitoring utilities and index the output.
By indexing the system information you can help correlate events in logs with OS level trajectory information.

The following utilities are supported:
- ps
- top
- vmstat
- iostat
- netsat
- lsof
- time/date

To start:
Copy this bundle into your etc/bundles directory
Modify any of the stanza’s to tailor their enabled/disabled, interval ( sec ), source, sourcetype.

By default all monitoring data is placed into a new index named monitoring.
If you wish to have the monitoring data in your default index, remove all the index=monitoring entries in the inputs.conf file and rename or remove the index.conf file as it will automatically create a new index for you.

If you have bugs or suggestions please let me know.
Good luck!

- more regexes for values
- add more calls for other stuff
- add filtering and matching like all my other bundles

Posted by