Splunk for Agile BI

I was at the TDWI Agile BI summit in August, ’11 and heard a nice quote in a presentation that summarizes the message – “In Agile environments working software is the primary measure of progress.” Having worked with enterprise BI/DW software for over a decade I have seen organizations struggle with their BI deployments for years before reaching a steady state ROI. For those that do have it up and running, statistics show that analytics is based on about 1% of the organization’s data. Five months ago, when I first heard about Splunk, I was amazed at how easy it was to download, install, consume any kind of data, historical or real-time, and produce reports within minutes. Almost immediately I decided to drop my career at a big BI vendor and go work for Splunk. At the Agile BI summit industry experts and customers were echoing my frustrations with traditional BI tools, hence this blog post.

First, what is Agile BI and why were there over 500 companies attending this event?

What are some of the drivers for Agile BI?

  • Business is rapidly evolving, what they need to measure & monitor changes constantly. Traditional BI applications force organizations to plan months or sometimes years in advance on what questions the users can ask the data, since the models are baked into the datawarehouse or BI tools. And it needs to be meet audit and compliance requirements.
  • Need for more analytics driven organization. Organizations find it easier to teach some technology to a domain expert than train a technical person on the business domain. These domain experts need an easy tool for prototyping analytics to find where the gold exists within the data. Once confirmed, it can be rolled into a production BI dashboard.
  • Decouple data usage from data preparation. Fast and timely access to information, not a report factory, but something they could train users on to build their own dashboards
  • Make BI results easy to consume and enhance, that is self-service, without the intervention of IT
  • DW solutions are time and resource intensive to deploy and hard to manage. Collection of data from disparate source (federation) should be easy. End users need consistent and reliable access. Make it easy to access source data
  • 80% of the 1st step in BI (data integration) is searching for the data you need. They need Google like BI search.

An Agile manifesto was conceived to capture the essence of this methodology. Here is what they came up with including additional commentary to put it into context:

  • Individuals and interactions over processes and tools – Simplicity (the art of maximizing the amount of work not done) is essential.
  • Working software over comprehensive documentation – The highest priority is to satisfy the customer through early and continuous delivery of valuable software.
  • Customer collaboration over contract negotiation – Business people and developers must work together daily throughout the project. The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  • Responding to change over following a plan – Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage. The best architectures, requirements, and designs emerge from self-organizing teams.

Why did I come away pumped about Splunk?

After listening to speakers and customers for three days, I walked away more upbeat about Splunk as a Agile BI solution. A lot of the pains expressed about traditional BI were either already addressed by Splunk or being rolled out shortly.

  • The most significant benefit in my opinion is around data access in DW solutions which are time, money and people intensive to deploy and hard to manage. Splunk eliminates the ETL layer saving potentially millions of dollar.

  • Splunk provide a Google like search bar. While Google looks for relevance, Splunk searches by time series. This reduces upto 80% of the 1st step in BI which is finding and correlating data to report on

  • Correlation across disjoint data. Searching across data sets that do not have a common field is a core strength of Splunk. Splunk timestamps data at the finest level of granularity, therefore it can correlate events from different systems.
  • A key formula for successful BI implementations is starting small and scaling as needed. Splunk offers a great model architecturally and from a pricing standpoint to support that. Most customers start off with a free download on and scale up as value is realized.
  • There are a whole slew of “next-gen” functionality that Splunk provides to enable easy, rapid and self-service BI which are covered in the next section titled ‘Limitless Agile BI Functionality’.

“Limitless” Agile BI Functionality

Forrester analyst, Boris Evelson, outlined some next-gen BI functionality that customers are asking for and I have included how Splunk addresses them.

  • Unlimited dimensionality – Don’t want to limit analysis to a pre-conceived framework that’s inflexible. Splunk does not require defining dimensionality ahead of time. It lets the user normalize data at search time giving full flexibility on how they want to navigate through the data.
  • Drill anywhere – Investigate on all the data versus some preconceived aggregate subset. The user has access to all the source data (if the role permits) for analysis with sub-second response times.
  • Information auto discovery – auto discovering data sources, business content, entity relationships, dependencies and profiles. Splunk automatically discovers key fields in the data, based on key-value pairs among other things, and relevant statistics which are valuable for starting analysis.
  • BI on BI – Analyze how BI applications are being used, when, and by whom. Splunk has analytics for activity on its search, indexes, servers, inputs and scheduled jobs out-of-the-box.

    • Data agnostic – Technology that is agnostic of data types, structured or unstructured, disk or streaming. Splunk will eat any machine generated data, in real-time.
    • Mobile BI – All the analysis possible on smartphones and tablets. With the non-flash interface, all the power of the Splunk is available on your mobile device.
    • SaaS model – Easy to standup a BI environment to create a prototypes. Splunk Storm provides the ability to setup an instance in less than 5 minutes.
    • Embedded BI within processes – You do not start with a blank screen. Click in a process and it passes the context to BI application and present relevant info. Customers are using applications that pass the context via the Splunk API.
    • Adaptive data models – Easy to change underlying semantic layer by the user. It is simple to change the schema at search-time in Splunk.

    A pertinent example which I heard recently, Taiwan Mobile uses Splunk to look across their mobile and web delivery platforms for insights into visits across customers and visitor volume trends depending on time. The Telco can track content that is most popular with visitors and correlating with ad clicks (also tracked using Splunk), they can optimize delivery of ads based on value, in real-time.

    Biggest challenges with Agile BI:

    TDWI did a survey on the inhibitors that customers perceived with Agile or Self-service BI and here are the most popular concerns:

    • Business user skills – The number one inhibitor was the lack of appropriate business skills (59%). There is nothing more guaranteed to cause failure than presenting a segment of the business community with BI technology that is too complex or difficult to use for their skill level. Splunk is working on making it easier for new business users to do their analysis.
    • Lack of data quality, control and goverance – (55%) If the information workers perceive that the data is of low or unknown quality, they may not use it. The data may not need to be perfect but it does need to be of consistent or predictable quality. Splunk supports audit requirements for its customers. In very large data sets where consistency and predictability play a role in quantitative analysis, the super set of data may have completely lost context due to normalization. Splunk preserves the original state of the data allowing normalization at search time. It adopts a hybrid approach of using raw (structured and unstructured data) along with enriched data for higher quality of analysis.


    I reassert the quote from the summit “In Agile environments working software is the primary measure of progress.” Today, Splunk has a lot of the functionality that BI community are asking for. The free download that installs in minutes gives the opportunity to gauge the value of Splunk with a pilot project before opening the throttle and unleashing it across the business. With its first class search-based agile BI platform and over 2,600 customers it’s well on it’s way to disrupt the BI market.

    Posted by