Thoughts on Splunking Storage Data

I’m not giving you any solutions today (boo!), just some clarifications based on how I see the world of data as it pertains to Splunking storage. Been thinking about this topic a lot over the past year or so, and it’s a shame that I’ve not bothered to write any of it down here before. Fixing that starting now! (This is not to be confused with the topic of storage of Splunk data–also a topic I think a lot about! That’s another series of blog posts.)

First a quick definition of storage. I’m talking about the storage subsystem in a computer system. This can be local storage DAS (Direct Attached Storage), or it could be SAN (Storage Area Netowrk) or NAS (Network Attached Storage). Many differences between the three categories, but for the sake of this discussion–machine data–it’s all largely the same. Of the three, DAS is the least interesting in terms of the data available to expose,  however, it can be combined with data from the OS to make it more valuable for Splunk use cases.

The following list is what I consider to be the kinds of machine data that are relevant to storage-related use cases in Splunk. It’s roughly in order by most to least important, based on many discussions with customers and partners. I’m sure the order will vary a lot when it comes to individual customers. The list:

  • Performance: health/performance metrics. High value, but high volume (so must account for tuning granularity, scope of collection). SNMP sometimes. CLI can usually produce dump of metrics to CSV etc. Many storage vendors tightly control access to this data, although this trend is changing.
  • Events: Faults, changes in configuration state. Typically pushed. Syslog is common for this in networking, not so much for storage. SNMP sometimes, Proprietary always. Low volume data. If push isn’t available, CLI options usually exist for polling.
  • Inventory: point-in-time configuration state of the entire system. Disks, LUNs, volumes, nodes, clusters, feature availability, and so on. Low volume data. Very important for correlation! Usually requires CLI or script to gather.
  • Audit: sometimes included in events, but often not. Includes changes to file/block service SESSIONS and CONTENTS of the service (i.e. files). “Session created” in storage context can mean iSCSI target attached, or NFS client mount. File state change is traditional OS file auditing. Who changed what when, from where (and sometimes how). High to very high volume data! File level auditing can be so high it can cause sizing/value concerns. Depending on the platform, it can be challenging to obtain audit data.
 Do you agree with this list? Anything missing? Out of order? Are you collecting storage data in your environment today? How and why? Which vendors and storage platforms?
In a follow up post, I’ll talk about some storage analytics use cases, and how to achieve them in Splunk.
(Edit: horrible run-on sentence fixed, can’t believe that one slipped by me! Also, teaser for next post.)

Hal Rottenberg

Posted by