Caching Hadoop Data with Splunk and Hunk

Update 9/27/16: As of Sept. 27, 2016, Hunk functionality has been incorporated into the Splunk Analytics for Hadoop Add-On and Splunk Enterprise versions 6.5 and later.

Although Hadoop is good at processing a large amount of data, it is not the fastest platform. Below are a list of options that Splunk and Hunk can offer to speed up the retrieval of results and lower the processing overhead of Hadoop.

Each option has its own advantages:

Screen Shot 2015-05-05 at 11.54.16 AM


1) Hunk Report Acceleration

This option caches the results in HDFS and keeps it fresh and current.  By default, Hunk will check for new Hadoop data every 10 minutes.

Details =


2) Hunk Scheduled Searches

This option caches the results on the Hunk node and is available on the Search head for double the frequency of the schedule.  For example, if you schedule the search to run every 4 hours, the results will be kept in cache for 8 hours.

Details =


3) Hunk Summary Indexing

This option allows you to create a small summary index on the Hunk node. You can then run searches and reports on this summary index.

Details =


4) Static Reports

This option allows you to generate a static report and lets you view it without any overhead on Splunk or Hunk.

Details =



5) Hadoop Connect Import (part of the Hadoop Connect App)

This option allows you to take data from HDFS and import it to a Splunk Indexer, and every time new data arrives in HDFS it will automatically be copied to Splunk.

Details =


Raanan Dagan

Posted by


Join the Discussion