To me, the biggest reason that this matters is because useful work can be done without having to fully understand everything that’s been gathered in the machine data, which lines up pretty well with the way that things work in reality. Laying down an incomplete schema across fungible reality can really mess things up if you ignore the parts you don’t understand, as well discussed in Seeing Like A State. Schemas are how we understand, communicate, and manage things, people, actions, and relationships and they have to work at least a little bit. However, schemas don’t spring fully formed from the forehead, they have to be built up and maintained in the face of change. In fact, they work okay when they’re young and simple, because our semantic models grow with us. Just as people grow into their full understanding and continue to learn through their lives, your Splunk installation can grow and change over time. Let’s consider a few phases of that maturity.
I think of the first phase as “The love child of Google and Excel” — all you need is a free Splunk download or a Splunk Storm account, and you can put data in, get reports out. Anyone can load up data and get operational business value from it, which is pretty great. Like a student, the Splunk instance is being loaded up with new information and new meanings. The second phase, “There’s an App for That”, is analogous to that student starting to publish; when you want to share knowledge, you write up your ideas and post them to the public. In Splunk, we call those apps and we put them on Splunkbase. Like a well-crafted essay or short story, they’re solid, useful, and ultimately self-contained, with an implicit, internal schema… but nothing forced from on high at index time. After all, if the schema was forced at index time we wouldn’t be able to understand the same data from two different perspectives, which would be deeply limiting.
But how exactly do we understand one piece of data from multiple perspectives? To me, the most valuable element to late-binding schema is that it allows multiple schemas to be applied from different perspectives; in Splunk, a simple, application-specific schema can share territory with a larger, role-specific schema. The third maturity phase, “One App to Rule Them All”, is when an app starts to rely on a shared schema that enables a role instead of an application. This is what we do with Splunk App for Enterprise Security, which relies on shared schemas described in Splunk’s Common Information Model and implemented via Technology Add-ons. A role such as Security Analyst or Compliance Auditor needs to be able to step up from the technology level and understand the data from a completely different level of legibility, and that is where the shared schema comes in. The beauty of Splunk’s late-binding technology is that this shared schema does not have to be determined back at phase one. The same piece of data can mean different things to different people at different times; that requires flexibility.