Semantics and Machine Data

One of the first and most beloved series of dashboards used at Splunk internally were created by R&D and product management teams, deriving a number of statistics from the downloads of Splunk software from our website.  The apache log provided the primary raw information for these dashboards, which were enriched and used to show download activity globally, by version, platform, and by country, and geo.  These have been the business analytics used to gain insight into the distribution of our products around the world.

Since taking on the new roll out of Splunk internally, the IT team has been working to set up a series of charts that focus more on operational metrics – the up time of the service, performance of the download function, etc.  It’s here we ran into our first need to consider “Semantic Logging” – logging events or values explicitly for gathering analytics.  In this case, we wanted to get the duration of each download, in order to know what experience our customers are having gaining access to the code. So the team added to the apache log the time taken to serve the request (%D) and will be adding the connection status (%X) to determine any abandonments.

The first set of statistics on test data is in the table below, showing the performance of the downloads by country, and platform.  With this we will develop a trend line, and use it as a measure to ensure the performance of service – ultimately to know where and when a content distribution network will be needed. These small changes to the logged event itself are an example of semantic logging leading to improved operational intelligence.

More info here: Machine Analytics

Posted by