Universally Indexing Business Data

By the title of this entry, you may be thinking that there is some new capability within Splunk to index other types of data. That’s not the intention. From its roots, Splunk was used to index and search on IT data. It still is. However, because of the flexible nature of the software to index any type of time series text data, customers using Splunk do not restrict it to indexing only IT data. From the beginning Splunk was designed to universally index data from a variety of sources as long as the data was eventually ASCII text in representation.

Due to this inherent capability, Splunk can index data that is not necessarily meant for consumption by IT staff and has more of a business focus. Much of industry specific business related data lies within unstructured files or locked in proprietary applications that only allow access via an API. If this data were indexed, it would solve a problem of effective storage, time based retrieval, and ad-hoc search. That’s just the tip of the iceberg. The lens of Splunk can be used to do more with this ocean of data. If we add functionality that includes statistical analysis, aggregate reporting, and alerts, the business value of the data increases to the point that it may effect the bottom line of the mission. All this is possible with Splunk.  Because Splunk does not rely on a database to a priori construct a schema based way to store this data, it makes it tremendously easy to index it and provide the rest of the capabilities just mentioned. Rather than continue to pontificate on the virtues of universally indexing business data, I will provide more concrete examples through my own experiences, which should shed further insight why this is a worthwhile endeavor. I’ll use examples from different industries to illustrate the topic.

Financial Services

One of the most common use cases with Splunk in Financial Services has to do with monitoring trading systems. A trade logically can travel from front to middle to back office to payments in its processing history. All the information about the trade’s activity usually ends up in file after file on different servers. Splunk could be used to index this data to access it from a central console.

The first use case is where is the trade in the system? As it travels from one place to another logically or physically, a Splunk user could simply type Trade=”Some ID number” and instant results come back. With a little more effort, a Splunk power user could create a form search view where the trade ID is entered and the results come back in a more tabular manner that is consistent with what a business user would like to see. Trades may be rejected for a variety of reasons beyond technical difficulties such as incorrect CUSIP, commission rate not supported, or insufficient funds. Knowing this information ahead of time with alerts or ad-hoc searches can provide a quicker resolution to what is adversely affecting a trade. Here’s a form search that I use with fictitious trade data for a demo.

Trade Form Search

Trade Form Search

This opens up a number of possible use cases in the middle and back office:

  • A report on the number of accepted, in process, and rejected trades.
  • Alerts for rejected trades for high amounts.
  • Summary of the currency amount traded per day
  • Summary of the amount of securities bought or sold
  • Average number of trades per account on a given day report

Average TradesThe possibilities for business activity monitoring are only bounded by what data the system produces. Instead of spending a large amount of resources on building a sophisticated trade tracking system, which essentially can produce the same results, Splunk can easily be used to monitor trade activity that is posted in unstructured text files. Splunk users are currently indexing this type of data for their business needs.


To continue my demo example, if you are monitoring trades, then it makes sense to also have the ability to monitor stock trends which may correspond with the trading activity. As part of my demo, I also use a publicly available web service to call as scripted input where the output of my web service client is indexed into Splunk. Naturally, I am monitoring stock activity on specific securities that happen to correspond to the ones being traded. You can download this add-on from Splunkbase. Here’s an average stock volume graph from my demo using real stock volumes.

Stock Volume

Sample Volume Report


If the Splunk Dashboard is giving you statistics on trade activity and stock trends, it makes sense to also know what is causing an increase in volume activity in the market. To do this, you can start indexing RSS data into Splunk and show relevant articles on the same dashboard that is showing trade and stock activity. Again, the complete download for this input is on Splunkbase.

rss headlines

Finance headlines

At this point, by universally indexing business data that happens to have trade activity, stock trends, and related news, you now have a Splunk based application that provides business value beyond the raw data for your enterprise.


There are a number of Telco examples on how people use Splunk with this type of business data, but the one that stands out to me involves records for call events known commonly as Call Detail Records (CDR). A typical CDR event may look like this:

01-10-10 10:55:00, 4153458765, 4153455634, 34343, ...

CDRs are usually in a delimited format where each record has a number of fields separated by an ASCII delimiter. Each field represents an aspect of the call as in caller, receiver, time, duration, etc. This not only makes it appealing to index into Splunk, but also provides a capability to automatically extract the individual fields at search time giving flexibility should the format of the record change.

Because call volumes are so high, this type of data usually involves billions of records over a given time making it difficult to simply put it within a relational database and just as difficult for customer support agents to explain to their customer what were the details for a call made 6 months ago, and why they were charged for a 30 minute call, which they aren’t being charged for currently. Ad-hoc search with Splunk makes this simpler to perform for the call center that needs to investigate billing inquiries.

A demo I like to give with fictitious call detail records to is to show an aggregate report on what types of calls are being made in a given time period so that a cellular marketing department can decide what types of promotions to run at that time. Here’s a sample dashboard depicting this:

Calls By Type

Another use case that Splunk can be used with CDR that is more urgent in nature is about who is calling who and who else did the initial recipient call.  Splunk’s transaction search command can be used to group similar records to provide law enforcement this critical data to carry on their work. On a lighter side, this same search result can be used by marketing agents to up sell friends and family network plans based on who calls who most often.


My last set of use cases involve situational awareness at the physical environment level. Last year, someone who maintains a high technology building, asked me if it was possible to index building statistics via an API. The answer is, of course, yes. Splunk would call a script on set intervals to call their API to get their statistics and the standard output of that script would be indexed into Splunk.

Building sending data to Splunk

Building sending data to Splunk

The statistics centered around the environment of the building which included temperature and humidity readings for each floor. Keeping in mind that this turned out to be a hypothetical use case, nonetheless, the implementation for it can be real. Splunk can index each floors’ readings and then provide trend analysis for temperature and humidity that can be correlated to costs. Moreover, if an API is provided to control roof and basement fans, alerts could be raised via Splunk when the medium temperature reaches a threshold that would trigger fan activity. Similar alerts could be used to monitor hallway lights, fire alarm responsiveness, and sprinkler water levels. This may sound like a far out into the future use case, but the technology to implement it is here today.


A more concrete use case in the environmental arena is indexing weather reports for different cities, which is another demo I sometimes present to users that want to see the different source types that Splunk can handle. As in the previous use case, Splunk calls a web services API every few minutes to get the report on a list of cities provided to monitor. The weather report for each city comes back in XML format. Once again, you can download this distribution from Splunkbase. Below are some sample Splunk reports that can be derived on the fly from this data.

Maximum Temperature

Maximum Temperature

Average Humidity

Average Humidity per City

The business impact for this type of data is not as high as in the previous use case, unless you work in meteorology or are planning a trip, but the data itself can be used in a larger context. If you are monitoring climate changes, indexing this data for further analysis would assist in ascertaining climate impact on external forces such as fossil fuel emissions. If other types of scientific data are also indexed into Splunk, correlations or lack of correlations can be made. Splunk will not solve the climate debate, but the task of universally indexing, searching, and reporting on this data provides insight that may otherwise be laying in generated files or locked systems.


I hope this non-exhaustive list of examples has provided an entertaining and informative vision for some types of data that can be indexed into Splunk. Splunk users are already indexing trade data and CDR events into Splunk today. As time moves forward, the breadth of what is being indexed by actual customer use cases will increase beyond what we imagine today. Splunk as a technology shows that not everything needs to be force fed into a database to perform analysis. As most of the world’s data is not actively stored in a search-ready manner, this opens up an opportunity to solve a problem for real business needs without having to retrofit unstructured time series text data into structured containers. If you have ideas on indexing other types of business data into Splunk, please let us know, as deriving business value from your data is a worthwhile investment.

Nimish Doshi
Posted by

Nimish Doshi

Nimish is Director, Technical Advisory for Industry Solutions providing strategic, prescriptive, and technical perspectives to Splunk's largest customers, particularly in the Financial Services Industry. He has been an active author of Splunk blog entries and Splunkbase apps for a number of years.