TIPS & TRICKS TIPS & TRICKS

Splunk and Hadoop come together like PB&J. Two options to easily integrate Splunk with Hadoop and what issues they help resolve.

Splunk is providing two applications to integrate Splunk with Hadoop: Splunk Hadoop Connect and the Splunk App for HadoopOps.

These two integrations provide solutions for two major issues of Hadoop. One issue is that developing Hadoop applications is time consuming. As a result, most Hadoop-related projects take a long time to develop, and once developed, still require specialized knowledge to adapt to new requirements. Another issue is that monitoring a Hadoop stack across multiple servers can be extremely complex and time consuming. As a result, critical problems in Hadoop environments will often reoccur and remain unresolved.

Splunk Hadoop Connect, Splunk App for HadoopOps, and Shuttl (archives Splunk files to Hadoop) provide a complete integration to Hadoop.

Splunk Hadoop Connect

Splunk Hadoop Connect provides reliable integration between Splunk and Hadoop. It delivers three core capacities: Export, Explore and Import.

Export — enables data residing in Splunk to be copied to Hadoop. Export is a way to send pre-process or raw events in a reliable, deterministic way to a Hadoop.

  1. Splunk forwarders move data to an indexer
  2. Search head pulls the data from the indexer
  3. Search head stream data into a local directory
  4. Periodically Splunk compresses the file and puts it into the HDFS directory (location set by users)

Explore — enables Splunk to browse and navigate HDFS directories and files from the Splunk search head user interface, before deciding to import data into Splunk. Drill down into a set of directories, examine files and with a click of a button import and index data in Splunk.

Import — enables Hadoop files to be searched by Splunk regardless of their source or size.

  1. Using Sqoop, Hbase, or Hadoop command line files are moved to HDFS
  2. Splunk detects any updated or new file in the HDFS directory
  3. Splunk imports the data into Splunk indexers
  4. Splunk search head pulls the data from the indexer
  5. In Splunk you can apply access controls to the data as well as search, report and visualize your data

Splunk App for HadoopOps

The Splunk App for HadoopOps, a Hadoop distribution agnostic app, allows you to monitor, alert, troubleshoot, remedy, search, and analyze Hadoop nodes, HDFS, and MapReduce.

Splunk today is used to troubleshoot, monitor and analyze complex IT infrastructures – physical, virtual and in the cloud. The Splunk App for HadoopOps delivers an end-to-end Hadoop monitoring environment. It provides the single interface for monitoring the full Hadoop environment including the Hadoop network, switch, rack, operating system, application server, and database.

The main features of Splunk App for HadoopOps are:

  • Monitoring Nodes in a cluster – Displays a complete view of all of the servers in the cluster and displays key metrics for disk usage, CPU, and RAM
  • Monitoring MapReduce jobs – Displays information on the Map and Reduce tasks. Dashboards show real-time statistics as to how the individual tasks are operating. Data displayed in this view helps to troubleshoot MapReduce performance issues and provides the ability to drill down from JobIDs to TaskIDs.
  • Monitoring Hadoop Services – Displays information about the health of the Name node, Secondary Name node, and Data node. The services explore HDFS I/O, HDFS capacity per user, HDFS size, as well as the CPU and Memory of the HDFS daemons.
  • View Hadoop Configuration – Displays information about the configuration of each node and each daemon in the Hadoop cluster.
  • Search Logs from the entire environment – Splunk distributed search and indexing allows for real-time display of information from all Hadoop, Linux, Database, and Network log files to further enhance the end-to-end debugging of issues.
  • Alerts and Notifications – Set up alerts based on a single event, a group of events or a given threshold or timeframe. Per-result Alerting allows users a granular control over the notifications received when one of the Hadoop nodes, MapReduce tasks, or HDFS daemons is failing.

Shuttl

Shuttl is an open source project to allow Splunk buckets to be archived and retrieve from Hadoop. To see all the details and get the software go to http://blogs.splunk.com/2012/07/02/shuttl-for-big-data-archiving/

Splunkbase Links:

Splunk Hadoop Connect can be found at: http://splunk-base.splunk.com/apps/57216/splunk-hadoop-connect

Splunk App for HadoopOps can be found at: http://splunk-base.splunk.com/apps/57004/splunk-app-for-hadoopops

Raanan Dagan
Posted by Raanan Dagan

Join the Discussion