Before I introduce the star of the show, I thought I should introduce myself. I have two children under the age of two, one aged 18 months and the other 3 months old. Despite my wife saying she doesn’t want a third, it was born today and it was a hunk, named Hunk 6.3! This is my first release as the Product Manager for Big Data at Splunk and I really do think I have been given the best job at Splunk – the Product Manager for Big Data! My job is to look at ways frameworks such as Hadoop can augment the experience of one of the leading Big Data platforms in the world – Splunk Enterprise. What better way to start with a blog about the great features added to my new baby – Hunk 6.3.
You are probably reading this because you love Splunk Enterprise and are interested in extending your Splunk experience to batch analytics on Hadoop. Or you’re already using Hadoop and looking into exploratory analytics which puts you in good company with organizations such as Yahoo and Vantrix. There are lots of reasons that our customers want to do this: lowering their total cost of ownership by archiving Splunk Enterprise data to HDFS or Amazon S3; they have other data sets in HDFS that they want to run queries against in conjunction with Splunk Enterprise; or they are just curious! Hunk dramatically abstracts the Hadoop framework away from the user. This means you get an intuitive analytics frontend with powerful search language and visualization and reporting capabilities on top of Hadoop without having to write a single line of code!
Maps, everyone loves maps!
One of the great benefits of Hunk is that we share part of the Splunk Enterprise codebase. Hunk can leverage the advanced data analysis and visualization provided by Splunk Enterprise. Regardless of the data source or schema, Splunk now offers a new set of visualization and analytics features that are targeted to help address the user challenges of big data analysis. Anomaly detection provides a time saving method to quickly characterize and investigate trends within large datasets. Geospatial mapping and single-value displays help aid analysts to rapidly visualize, characterize and comprehend analysis results.
After installing and configuring Hunk (see this post to see how you can install and configure Hunk in under 60 minutes) you can produce powerful data analytics and visualizations with simple search commands on data at rest in Hadoop such as this.
| iplocation clientip
| stats count by Region
| geom geo_us_states
Or how about this one which shows the average taxi ride duration for New York neighborhoods using custom polygons!
Thawed buckets anyone?
In a previous job I used to refer to archived data as “dead data”, banished to tape never to be seen again. Now with Hunk archiving we can keep archived data active and alive, with the goal of continuing to derive value from it. Archived data can be queried along side fast moving data in Splunk Enterprise.
Using the Hunk archiving feature, released in Hunk 6.2.1, customers can drive down TCO by storing historical data in Hadoop on lower cost hardware. The capability provides a simple mechanism to archive data from Splunk Enterprise into HDFS or Amazon S3. Any data in warm, cold or frozen buckets can be archived and offloaded from Splunk instead of being deleted. For historical data, customers can easily search archived buckets in Hadoop instead of going through the thawing process in Splunk Enterprise.
What is really cool is that with Unified Search, added in Hunk 6.3, you can search archived data in virtual archive indexes as well as the live data in the Splunk Enterprise indexes that feed those archives. This allows our users to use the same UI and search commands to search real-time and historical data from a single familiar user interface. A great example of this in use by our customers today is for security indicent investigation. When a security researcher finds a new attack signature such as an APT in Splunk Enterprise they are able to take the newly discovered features and run a search over historical archived data residing in Hadoop to see if this has previously occured, all without leaving the Splunk search interface. Reduce your TCO without having to learn a new search language or tool!
Is this data in an open data format?
A question that I am often asked is whether the data collected and indexed by Splunk, in our proprietary format, is open to other applications. Well, the answer is a resounding yes on two fronts. Firstly, you can always use one of our great SDKs to search and retrieve the data. Secondly we have released a reader for this data format that has been moved from Splunk to Hadoop via Hunk. The Archive Record Reader allows Splunk buckets (specifically the journal.gz files) to be read by any Hadoop-based application. This gives our customers many options on what to do with this archived data.
- Leave it in the journal.gz format and allow Hunk to natively search it using either normal or Unified search.
- Use the reader as a plug-in for custom applications – Hive, Pig etc. Leaving the data in the journal.gz format but consumed by an external application.
- Use the reader to transform archived buckets into a new format that meets your corporate standard for data at rest in HDFS. The great news is that Hunk can also search this transformed data as well!
When is Hunk 6.3 launching and how do I get it?
So the good news is that Hunk 6.3 is Generally Available now. If you’re thinking “I want to try that!”, head over to the Hunk download page and give it a go for free. Alternatively if you have data in Amazon S3, give our Hunk on EMR hourly pricing AMI a go.
Product Manager, Big Data