Indexing data from Saas solutions running on relational databases

As we began work on building the app, I was again face to face with a familiar challenge…a challenge that you would encounter anytime you want to ingest structured data coming from any Saas based application that is running on a back-end relational database. In such a Saas based environment, the data is usually exposed via a REST, Webservices API or similar. As you know, in a typical relational database, all data is stored in multiple tables and records are linked across tables using ID’s. For instance the Incident table in ServiceNow does not have the Username that created that ticket but has a User Identifier (long cryptic string) referencing another record in the “Users” table that includes the Username, First Name, Last Name and other information.

The problem that you would face in this scenario would be: how do you ingest that data in order to make it easier to search? How do you build these lookups in Splunk knowing that the data is distributed across multiple tables?

The first solution that would come to mind would be to have the same python script that polls the data from SFDC or ServiceNow (as a scripted input) save the data in a CSV format directly and store it under the lookup directory of the app. The app would then use that CSV file as an automatic lookup on the searches that apply. However, this approach poses some limitations:

  • In a distributed environment, the lookups CSV files have to be stored on the search head. This means that either you have to come up with a mechanism to periodically copy the CSV files to the search head from the forwarder or use the search head to pull the data in directly. Both are not good options.
  • The python code has to handle updates to the lookup tables – in other words, update existing records in the CSV file with newly polled data that matches. This is doable but can be a bit tricky and would require some python coding skills.
  • What if I want to track changes to that lookup tables from an auditing perspective? Take the example of the Users table in ServiceNow, if someone changes the First Name of a given user in ServiceNow, the python script will poll the changes and update the existing record in the CSV lookup file erasing the previous information.

After many discussions, an alternative solution presented itself. This solution relies on indexing the lookup data as opposed to storing it as a CSV. Then the app runs periodic searches to build that lookup table in Splunk and keeps it up to date.

A good example of that search would rely on the “inputlookup” command to get the existing data in and “outputlookup” command to append the new result in the CSV file.

Elias Haddad
Posted by

Elias Haddad

Elias is an Emerging Market Presales Architect working out of the Dubai office. Prior to that, he was a Product Manager responsible for Splunk data ingestion and held various pre-sales, post-sales and business development positions. Elias lives in Dubai and graduated from Purdue University with a master’s degree in computer engineering.