SplunkLive! Utrecht – Het was een mooie dag

The SplunkLive show continued on to Utrecht in the Netherlands yesterday.  Our new Dutch partner, SMT, made the local arrangements. It was another fantastic day of customer and partner presentations and a lively audience of over 50 IT pros full of ideas and questions about different ways to use Splunk.

The first customer presentation was from Andrzej (“AJ”) Wolski, Unix Engineer at TomTom, the world’s leading navigation solutions provider. TomTom, known for its ubiquitous portable navigation devices used by 40+ million people worldwide, is in the process of a major move from a device to service orientation. This 3000+ person company with over 1.5 billion euros annual revenues offers its full range of core navigation and location capabilities – Plus, Map Share, IQ Routes, HD Traffic, Safety Alerts, Local Search, Fuel Prices, Online Route Planner, Location Based Services – as web services. This infrastructure is growing fast wtih 2 main datacenters, 800+ Linux / RHEL servers and 1000+ Citrix Xen virtual servers running primarily JBOSS and Websphere based custom applications, and thousands of Cisco network devices.

Before Splunk, TomTom had centralized syslog. However they had significant scaling problems doing ad hoc search and reporting with just homegrown scripts and syslog servers. Application logs were not effectively centralized, and developers didn’t have access to them directly. Moreover, they couldn’t effectively do long term trending for lifecycle and capacity planning. These were their initial motivations to find a commercial solution.

AJ found Splunk 3.x in 2008 and immediately downloaded it and started indexing his data on his own. It was clear that it would speed up analysis and AJ’s organization decided to start a formal evaluation. The audience was really interested in knowing whether TomTom had evaluated alternatives to Splunk – AJ said they had done competitive evaluations but no other product was even close to Splunk’s capabilities. According to AJ, “no other software was as fast; reporting was easy and flexible – that caught us by surprise; we can deploy across the datacenter any hardware, os and effectively handle applications that write to logfiles not syslog – network logging just doesn’t work well for webservers, JBoss and Websphere.”

Splunk was implemented in 2009, following our 4.0 release. AJ’s overwhelming message about Splunk’s success at TomTom was one of self-service and empowerment. Troubleshooting is fast, effective and easy. With controlled self-service access, engineers and even managers at TomTom have “discovered stuff about our infrastructure we weren’t aware of before.” What has been particularly impressive is how managers have embraced self-service reporting and dashboarding. Not only are capacity planning managers getting their desired trend reports, but AJ said “It always surprises me what people are doing with Splunk.”

One story that AJ told really highlights how unknown dependencies create problems and how Splunk can speed up troubleshooting to restore critical services faster. Apparently TomTom was experiencing a spike in errors in a shopping transaction – real lost revenue. It turned out that the transaction was hitting a database that shared network storage with a non-production database in which someone was running a 1 TB delete task. This is a root cause that only could have been effectively pinpointed by being able to do ad hoc correlation across silos by time.

I’ve rarely seen so many immediate questions to a SplunkLive customer presentation – we finally had to cut questions short and move on to the next presentation to keep things moving. AJ was busy fielding questions during every break for the rest of the day.

After AJ, Trey Darley came up to give the next presentation. Trey Darley is a consultant at a major international organization where he has implemented Splunk. While he was able to share no specifics on context, he was able to describe a use case that is common amongst many Splunk customers with critical networks.

Trey’s team is responsible for security devices – firewalls, proxies, etc. Whenever something breaks, the security devices are seen as a “black box” and blamed for having broking things by blocking necessary traffic. Fielding calls about “was this connection blocked and why?” was a major distraction before Splunk. He implemented Splunk to speed up investigations and eventually his management decided to go one big step further – providing the other teams controlled self-service access to find out if their connection was blocked themselves. Initially, people were stuck in their ways and adoption was slow. But one day, suddenly organizational behavior shifted and he noticed that people were starting to send Splunk screenshots around in the course of discussing incidents! They were now self-service and he was no longer being distracted by routine investigations of purported network issues. The blame game had been won.

Michiel Toes, General Manager and co-founder of SMT, also presented their perspective on the evolution of systems management from fault to service and performance monitoring in the context of the same trends of growing complexity and increased business demands that we see. It’s great to see partners coming out of the service management side of things embrace Splunk. SMT has trained two specialists on Splunk and has a great vision for how to apply the technology in the local market here.

All in all, a great day with an engaged Splunk audience.

Tot Ziens, Utrecht!

Posted by