On a crisp and wet Wednesday morning, the W hotel played host to SplunkLive Atlanta 2010. 100 attendees packed the well-heeled conference room, all eager to learn more about Splunk, get burning questions answered and connect with peers.
As with other SplunkLive events, we invited customer speakers to share their experiences with Splunk. On this occasion we had three speakers and every one of them was stellar.
John Daniely – Atlanta Journal Constitution
The first speaker was John Daniely, from Atlanta Journal Constitution, the only major daily in Atlanta and somewhat of an institution enjoying a daily readership of 2.3 million visitors.
John and his team are responsible for all aspects of network security, including Active Directory auditing, security event monitoring, anti-virus, firewalls and URL filtering. Their environment includes several hundred Linux, Unix, Windows servers and 1500 workstations.
John’s presentation was choc full of examples and use cases using Splunk for security. Here are a few of them:
One of the first feeds into Splunk was from their Intrusion Prevention Systems logs. This enabled John and his team to track the recent ‘Conficker’ worm – see the systems impacted and correlate this data with their firewall logs to quickly see if infected traffic made it into the network.
Protect against Malware
More recently, John and his team started sending Splunk SNMP traps from anti-virus software running on individual workstations. John said “when new malware is detected, all we need to do is type the name of the malware into Splunk and boom all the machines that have it pop up on one screen”.
Splunk + SIEM
John also talked about Splunk complementing their SIEM, because of its ability to index any type of data from any source much faster and without building custom adapters. “Many SIEMs require time-consuming customizations”.
Other current and future use cases include using Splunk for change monitoring, supporting Payment Card Industry (PCI) log retention requirements and to start Splunking application log data.
When asked to sum up, John said, “Splunk is very fast and saves us a lot of time. Investigating from one place with the ability to cross correlate between logs eliminates a lot of manual work”. He then offered advice to people evaluating Splunk, “anyone who’s on the fence about Splunk, I recommend go ahead and install it and you’ll be impressed”.
Tim Metz – Cox Communications
Tim is a six-year veteran of Cox Communications, the third-largest cable entertainment and broadband services provider in the country. Tim’s focus is network security, including both enterprise and their high-speed broadband network.
Their environment has “every piece of gear from every vendor you can imagine!” – it’s heterogeneous and geographically distributed. Tim’s strategy is to get as many syslog data sources in Splunk as possible.
Tim’s a big fan of syslog and believes “if you’re not watching your syslog, you’re not watching your network”. He went on to say, “We’re using Splunk for more than security … if somebody has a router or switch, a Linux server, anything, we encourage them to point their syslog at Splunk, which makes it easy to troubleshoot outages and examine firewall logs all from one place.”
Splunk in the NOC
Cox has Splunk running in the NOC and although Tim claims not to be a developer, he single-handedly created the main Splunk dashboard in 1 day from the ground up! He thanked the Splunk docs team for that one. Here’s the screen shot:
For Tim and the NOC team, Splunk helps them fill knowledge gaps between existing “fancy tools”, which provided a narrow view of their logs. The NOC dashboard built on Splunk, provides them with an actionable summary view of critical information, such as the type and #errors for different time periods (last 10 minutes, last hour, same time yesterday).
Splunk in the SOC
Tim also has Splunk running in the SOC, to conduct security investigations quickly and on demand – 4X faster than before. One SOC use case is monitoring for SNMP offenders – was it customers? Or was it their own tools that were misconfigured?
They also found Splunk useful for analyzing usage patterns – to help them find time windows when they could perform major upgrades without impacting users.
Splunk also helps Cox meet their PCI requirements. For example, Splunk monitors for use of the ‘switchuser’ command on highly-sensitive machines and maps user details to use of the command – this lets them quickly filter out routine actions from anomalous actions.
And what about the future? Tim said it was all about new devices and more data. He’s happy with Splunk’s ability to eat syslog at the rate that it’s sent, “The great thing about Splunk is that it’s already ready for new data types and sources.”.
Joseph Rinckey – BlueCross BlueShield Tennessee
Joseph is the lead VMware systems engineer for BlueCross BlueShield Tennessee. BCBST currently serves 4.3 million people across Tennessee and is part of a nationwide association of healthcare plans.
They are relatively new users of Splunk, using it to manage their VMware environment. This consists of 470 VMs, the majority of which are Windows, on 32 hosts, in 4 separate clusters spanning 50 different datastores – this covers about 43% of their total environment.
Before Joe, management’s perception of virtualization wasn’t all that great. They basically didn’t see it as production ready (due to historic issues). Then came Joe. Armed with a determined team of 2 (including himself) and Splunk, he turned this perception completely around, by maintaining uptime and reliability, and building confidence – “We don’t have a downtime on the hosts now with Splunk”.
Now management wants to virtualize 90% of their total environment in the next 12 months!
Choosing Splunk for Virtualization Management
When asked by an audience member why he chose Splunk, Joe’s answer was simple “Splunk does more”. He went on to describe their evaluation process: “We looked at 8 other VMware third-party management tools, including big names, with some nice features, but they were too narrow and specific”. He said, “We wanted more than performance management, we wanted to consolidate our logs, to grasp what’s going on in our environment. We wanted to do more than just get notified, we wanted to be proactive.”
Getting Data Into Splunk
Currently all VMware infrastructure data is being sent to Splunk, including host logs, which includes everything in syslog, plus several logs not in syslog, such vpxa and the hostd logs. He tipped the audience where they can find more information on getting non-syslog data into Splunk:
Joe’s using the Splunk for VMware App, which enables him to grab all vCenter logs in near real-time – an impossibility before Splunk due to volume and frequency, and also vmware.log files.
The end result is that they now have a clear understanding of their baseline environment:
- They know when/where something attached to our environment is failing – and can fix it
- Splunk notifies them if certain ESX processes (hostd) run out of memory
- They can visualize data in charts quickly and easily, see what percentage of VMware tools are out of date; analytics on server utilization and network utilization
What’s next for BCBST and Joe? “To do more with Splunk, including sending application logs running in the VMs to Splunk in order to provide a more complete view, creating management reports that provide greater visibility and best of all, enjoying how much Splunk makes me look like the Jason Bourne of Virtualization”!
What struck me about the event and other SplunkLive events I have attended is the collaboration between attendees. When one person asks a question, it’s often another customer attendee that jumps in to answer. This is why we are focusing more than ever on community at Splunk, including the return of Splunkbase, Splunk Answers and our first ever User Conference this year!