I had the pleasure of hosting SplunkLive Dallas late last week, with some of the best customer presentations yet from MetroPCS, Pegasus and Louisiana State University. 95 people attended, and nearly everyone stayed on for the afternoon technical sessions to dive deep into Splunk hands on.
What was remarkable about this event, to an even greater degree than other recent SplunkLives, is the extent to which Splunk deployments have evolved from simple search and break/fix – you could easily see an emerging maturity model for operational intelligence at work.
Lamar Holtzclaw, our local senior sales engineer in the Dallas area, gave a great demo showing exactly how Splunk can be used to quickly find SLA violations amongst transactions spanning multiple components, then quickly navigate to the root cause. His example was based on his real experience at a major telecom prior to Splunk, and is one of the most classic Splunk use cases around.
Pete Ehlke, principal systems engineer at Pegasus systems, then gave an updated presentation on how he’s deployed Splunk. For those of you who don’t know, Pegasus provides back end services for the hospitality industry, including the reservation systems behind nearly every online hotel reservation through familiar sites like Orbitz, and the back end systems relied on by major hotel chains such as Marriott, Fairmont and La Quinta. We’ve heard Pete before as he baked Splunk into their award-winning RezView system several years ago – Splunk’s been the main way that Pete keeps tabs on the health of the service as well as the primary troubleshooting tool for years. What was new this time around is Pete’s new real-time dashboard of key metrics that gives his business owners insight into what’s happening on their systems. Here’s the dashboard he showed…
This dashboard gives him a view into a 30 minute window, updated in real time, on such business metrics as the current transactions per second, the net bookings, response times for different chains and properties, etc. The data error percent was particularly interesting – this is the percentage of requests to Pegasus’ back end systems that have data quality problems – such as asking for a hotel in a city in which it doesn’t exist – finding these kind of logical problems helps protect revenue, and reduce the cost of handling bad requests – this is exactly the kind of problem that goes unnoticed without Splunk.
If this isn’t a classic example of the evolution from break/fix, to proactive monitoring, to operational visibility and then business insights, I don’t know what is.
Following Pete Ehlke, we heard from Gregg Woodcock at MetroPCS. Metro’s an example of a customer who dived feet first into the stage of using Splunk for business analytics. Gregg gave a fascinating account about how MetroPCS’ revolutionary fixed price wireless offering created some very specific operational challenges, each requiring the visibility afforded by Splunk in order to optimize the business.
The first of these challenges was in optimizing routing – Metro earns a fixed fee per subscriber per month, but needs to pay other carriers to route calls to their destination. These routing decisions are made on the basis of routing tables, but the costs vary based on constantly updated rate tables. Metro has developed dashboards in Splunk that correlate call detail records with rate tables, and provide comparisons between actual and possible routes. Gregg said that these dashboards routinely uncover specific routing changes that individually save $100’s of thousands of dollars.
Gregg also talked about Splunk’s usefulness in uncovering abuse. Again, being a fixed price carrier, Metro’s terms of service allow unlimited personal use. But unscrupulous customers sometimes abuse this by setting up relays, selling calls, etc. Gregg has developed extensive searches and dashboards in Splunk that correlate activity to uncover the most common patterns of abuse – avoiding losses and helping keep MetroPCS affordable for the vast majority of its honest subscribers.
Following Gregg, Allie Hopkins from Louisana State University spoke about her deployment. Allie is the manager for the server infrastructure team at LSU, a large state university with over 26,000 students and 10 separate schools and colleges. While this talk took us back to a more pure IT operational use of Splunk, it was a great reminder of the efficiencies Splunk can bring to any IT team.
Allie’s been at LSU for 15 years, and initially implemented a homegrown log search tool to “empower the helpdesk” – i.e. get herself out of the business of playing “log butler” for routine requests such as identifying why DHCP leases were failing, or who had a given IP lease. This tool was a maintenance burden and so she replaced it with Splunk several years ago. Over time, she’s empowered more and more teams with self service, role-based access to their data in Splunk, including the security team, AD logs, etc. This has streamlined operations and revealed previously hidden operational issues that were silently draining precious resources.
All in all a great SplunkLive. Now on to Boston and Baltimore…