ETL >> SplunkTL

Imagine this scenario:

You have been asked to prepare reports that require data from corporate infrastructure assets.  These reports could be about web site activity, customer behavior, employee application usage, physical equipment uptime (or downtime), network traffic volumes, or whatever.

This could be a challenging assignment.

The reason that this is a challenge is that your company has standardized on a business intelligence (BI) suite for report creation and distribution.  That BI suite only communicates to your corporate standard relational databases (RDBMS) through the Structured Query Language, or SQL.

And chances are good that your company has made a large investment in an ETL (Extract, Transform, Load) tool to populate those databases.  (It is possible that this ETL tool was purchased from the BI or RDBMS vendor but could also have been purchased separately.  Or is one of the many popular open source offerings – free, like a free puppy.)  Regardless, the ETL tool has become the gatekeeper of data to the database and effectively to the BI layer.  And ETL tools only extract data from structured sources such as databases and delimited or fixed length files.

You have to use the corporate standards.

But your data is largely unstructured.  It is not in databases.  It is in messy log files.  It is in inconsistent script output.  It is floating around on the network.   And more.

This is where Splunk comes in.

Splunk is schema-less.  It can harvest data from anywhere, in any format, store it and then make it searchable.  And searches allow you to structure the result, so Splunk allows you to impose structure on any and all of your unstructured data!  The results can then be easily written out to a structured file that the ETL tool can consume and pass on to the RDBMS where the data is available to your BI suite.

(Of course, depending on the reporting requirement, you may be able to do it all in Splunk without defining the extract search, the ETL process(es) and the database schema(s).  As this data is “off limits” to the BI layer, this may allow the exception to the corporate standard for reporting.)

NOTE:  This is for version 4.3 and prior only.  This scenario could change mechanically in the future, but not necessarily logically.


Posted by