Big Data Thoughts…

It happens to me quite a bit that I hear a song and then it keeps playing in my head.  My 4 year old is notorious for singing the same song over and over and then I find myself humming during my long train ride to work.

Sometimes, it happens at work – you hear a thing and you keep hearing about the same thing in almost every conversation.  I am sure you have had those times too.  A number of you will have had days or weeks when you have had some discussion on “big data”.

For the last three weeks, I have had number of conversations on the topic of big data.  Strata, eMetrics and a last week’s trip to see some key customers were mostly centered on web analytics and big data. The conversations around big data primarily circle around number of servers, size of data, total data stored etc.  Sitting on the business side of the organization, I like to hear about Speed and feed, but for me the discussions on value from big data is key.  I have always been a believer of value of out of data – insights that can move the needle for the business. Those few data nuggets from the terabyte of data are golden.  Some observations from the events and the conversations: 1) Organizations are investing in big data technologies like Hadoop or larger relational database implementation.  2) Very few of these organizations have been able to get their big data implementations up and running quickly – even after many months. 3) Finite set of business problems that are planned to be solved using big data with not much thought into the changing business environment.

After a tremendous investment in technologies to capture data, organizations embark on the next frontier.  Okay, so you have got the data into the Hadoop – now what?  Getting data out of Hadoop is even harder.  There are a slew of technologies out there and business analyst or data scientist are spending more time pulling the data, massaging the data and bringing structure to the data for meaningful analysis. It painful to see that the IT team has made the data available, but the business team can’t get the much-needed value in a simple and easy way.

Keeping on the big data story, secondary research firm eMarketer released an interesting article, on March 19, 2012, on how marketers are struggling with linking digital data with big data.

Interestingly, the top pain points are sharing, personalization, data granularity, real-time, individual customer data.  This is very similar to what I have heard in the last few weeks. All these struggles boil down to a simple question: what is the anticipated value from the big data solution? Surely it must be to deliver insights that are actionable and move the needle for the business. Very few offerings provide an end-to-end solution to the problem.

The question often asked is  – does everyone need a big data solution.  The short answer is no.  Many a times organizations believe that investment in a big data solution like Hadoop equals all problems are solved – again the answer is absolutely not.  Investing into a solution is just a means to get to the data.  Organizations need to invest in the right solution that provides easy access to the data for analysis and visualization.  Most important is the investment in the right resources that can bring value to the data.   The challenge going forward will be on the resource front.  Good analytics talent is hard to come by.  Not every organization will be able to attract the top talent.  Easy to use and flexible tools will ease some pressure from organizations in getting value from the data without waiting for the ninjas to drive analysis.

Real-time reporting always draws controversy.  The question that needs to be asked – Does every organization need real-time access to the data.  My take is – it depends on the big data problem you are trying to solve for.  If it is security or infrastructure management or personalized targeting for mobile and social or even in-session targeting for websites  – having access to real-time is a must.

The final question is – is it necessary to have access to granular data?  Depends – couple of points to consider 1) For better optimization and to quickly identify root cause or opportunities – access to user and session level data will be ideal 2) Security use cases, specially around fraud detection and analysis (Mark Seward can spend a day with you talking about the use cases) – you need access to the user and associated sessions.

To summarize, big data problems are here to stay.  Have a clear understanding of what business problems your organization is looking to solve, find the right solution that is easy to use, flexible and can scale quickly to your data needs and your data diversity.  Lastly, invest in the right analytics talent and provide them with an end-to-end solution that will allow them to spend more time conducting the high value analytics. Assess your needs, use time to deliver value and democratization of the data as a gauge for the right investment.

Big data is a challenge and an opportunity and emerging big data technologies are not a panacea. Understand what business problems your organization is looking to solve and therefore what needs to be focused on. Is it X or is it Y? Where do you want to be spending your time? How important is democratizing the data? Etc.

I hope this post is helpful and thought provoking.  Now to you, please share your thoughts on how you are solving or thinking of solving your big data problems.

Learn more about how Splunk is solving big data problems here. Visit us at GigaOM Structure in NYC this week.  Find a Splunk event near you :

Happy Splunking!

PS:  Posting from London and feeling bad about the use of “z” instead of “s” in words like organization :–)  Original draft was in San Francisco.

Rahul Deshmukh

Posted by