Splunk DB Connect & Cloudera Hive JDBC Connector

First things first. Try Hunk before you go down this path. Hunk allows you to seamlessly query your Hive tables with native SPL queries from the search interface. This gives you all the goodness of Splunk including agile reporting and analytics, role-based access controls, report acceleration and the fast time to value that you’ve come to know and love from Splunk. If you have tried Hunk and it’s just not the right fit then read on.

I recently helped a customer use Splunk DB Connect and the Cloudera Hive JDBC Connector to query tables on their Hive Server 2. Hive Server 2 is available in CDH 4.1+. You can read more about it on this Cloudera blog post.

Here are the quick and dirty steps to get this working in your environment.

  1. Download and install Splunk DB Connect
  2. Download the Cloudera Hive JDBC Connector version 2.5.6 if you’re using JRE 1.6+. If you’re using JRE 1.4 or 5.0 download the Cloudera JDBC Connector version 2.5.4 as it ships with connectors for JDBC API v3.0.
  3. Copy the correct connector zip file for your JDBC API version to your Splunk server where DB Connect is installed. This could be a search head or a Heavy Weight Forwarder depending on your setup.
  4. Extract the connector zip file into $SPLUNK_HOME/etc/apps/dbx/bin/lib
  5. Create the file $SPLUNK_HOME/etc/apps/dbx/local/database_types.conf with the following
  6. [hiveServer2]
    displayName = Hive Server 2
    # Replace jdbc4 with jdbc3 if using JRE 1.4 or 5.0
    jdbcDriverClass = com.cloudera.hive.jdbc4.HS2Driver
    defaultPort = 10000
    connectionUrlFormat = jdbc:hive2://{0}:{1}/{2}
    # Substitute <valid_table> with a table from one of your Hive databases
    testQuery = select * from <valid_table>
    defaultSchema = default
    defaultCatalogName = default
    # If you don’t want to implement a test query to validate the connection you can uncomment the following setting and comment out the testQuery line above
    # validationDisabled = true

  7. Create a new database connection in DB Connect. The database I’m using in this example is called ‘hunk’. The table, ‘ccrime’, is pointing to a 2+GB CSV file of crime in Chicago since 2001. Manually specify the database you wish to query for this connection. Warning: Do not use the Fetch database names functionality when creating the connection as this didn’t accurately pull down my databases.
  8. hive_dbx_connection


  9. Query your database to validate the setup
  10. dbx_hive_query

  11. Now you can utilize your Hive tables through any of the documented means for DB connect. You can also restrict who has access to the database through Splunk’s role-based access controls.

Scott Haskell

Posted by