Digital Resilience Pays Off
Download this e-book to learn about the role of Digital Resilience across enterprises.
Orlando is the 5th SplunkLive of 2010 (following events in Boston, London, Vienna and Munich) and the first ever in Florida. The event drew a capacity crowd of enthusiastic customers and users.
As usual at these events, we asked customers to stand up and talk about their experience with Splunk – how it’s used, where it helps, lessons learned and the impact on their organization. On this occasion, we had two great speakers from Voxeo and Presidio.
Voxeo is the world’s largest provider of Interactive Voice Response (IVR) services, supporting over 82,000-hosted ports globally and also have hundreds of on-premise deployments. Over 100,000 developers use Voxeo’s platform to integrate with their existing web applications and communications via traditional, next generation, or social networks – instant messaging, twitter, facebook, skype, SMS, voice, etc.
RJ is CTO at Voxeo and responsible for bringing Splunk in to help fulfill his mission to “make communications simple”. When asked what they use Splunk for, his response was simple, “what don’t we use Splunk for!” And indeed, Voxeo showcases multiple use cases for Splunk. More on that later, but first what does Voxeo’s IT infrastructure look like?
RJ spent several minutes discussing Voxeo’s global infrastructure. Their hosted IVR platform spans 7 datacenters across North America, Europe and Asia Pacific. There are over 2000 servers across these datacenters, generating approximately 1 terabyte of raw log data per day in total. These facts alone, pose significant challenges when seeking to make use of these logs and IT infrastructure data: shipping logs to a central server is not feasible due to logistical, security, regulatory, legal and privacy reasons. Add to this the need to save their data for 7 years, due to compliance and regulatory reasons and Voxeo’s policy of 100% uptime SLA to their customers, and finding a way to better manage their IT infrastructure data looked like a signficant challenge.
RJ starting looking for different solutions and eventually came across Splunk. Not only did Splunk’s distributed architecture and scalability characteristics match Voxeo’s requirements, it also fully addressed the different ways they wanted to use their IT infrastructure data:
Splunk’s distributed architecture is deployed across all Voxeo’s datacenters, providing secure and rapid access to logs and IT infrastructure data, whilst avoiding the need to ship data around.
Splunk’s scalability model is based on MapReduce, which scales linearly across commodity servers to absorb the growing transaction and data volumes. Splunk also integrates to Voxeo’s single sign-on architecture to provide a seamless experience for external customers and developers using Splunk. Voxeo makes Splunk’s ad hoc reporting as a value add capability embedded in their hosted offering.
More recently, Splunk is also embedded in Voxeo’s on-premise product and integrated into their management console (Prophecy Commander). Providing a replica of the hosted architecture, but for an on-premise environment from a single laptop to a large datacenter.
Final note – RJ’s complete presentation delivered at SplunkLive Orlando is available at the following link (thanks RJ!): http://www.slideshare.net/voxeo/logging-at-the-tb-scale-voxeo-at-splunklive
Presidio, Inc., is a diversified professional and managed services firm and recently merged Coleman Technologies, a leading IT and systems engineering firm, providing, amongst other things, information technology and systems engineering services, “We manage outsourced NOCs”. Their NOC environment includes Linux, Windows and Cisco equipment for unified communications.
David joined Coleman Technologies 5 years ago, heading up their managed services group and specifically building the NOC practice. Here’s his version of events, “if you complain enough, you eventually get responsibility and I ended up running the NOC!”
David’s immediate pressing issue was in helping manage the data deluge. David used Zenoss in the NOC for fault performance monitoring, “Zenoss is great for displaying row-by-row information on the screen, like SNMP traps, syslog and threshold alerts, but the screens didn’t scale as the NOC operations scaled. They found that as they added more customers, more devices, more systems and more advanced technologies, important things simply got pushed off the bottom of the screen.”
He said, “we simply did not have the physical real estate for eyes on glass, to see all the important messages and see what’s going on. This is a big problem”. David and his team then deployed Splunk to manage the low level, high volume data and find problems, which can then surfaced via Splunk dashboards to the NOC.
Let’s explain this somewhat controversial statement. In David’s words, “when I started Splunking the data and seeing what we were missing using our traditional fault performance systems and how we could correlate it and show it dashboards, I literally went home sick to my stomach, not being able to sleep – and then incessantly began using Splunk and finding the silliest errors there were vastly widespread in customer environments – fans and routers that were stopping, duplex mishmashes, VLAN tags that were incorrect. Easy to fix problems that nobody knew about!”
David and his team sees Splunk to filter out noise and map severities to different message types from custom and packaged applications. Lower level events are Splunked and now David is now able to catch critical issues as they are building – see the frequency of the issue occurring, how many locations it’s occurring at, a break down by field extractions and line of business. By doing this, David and his team obtain actionable intelligence they can respond to quickly. He really liked how level 1 NOC operators can create custom dashboards for specific customers to monitor known issues and without involving development teams.
Final word? Even power users of Splunk get value from Splunk Live! events. David said that after learning more about dashboards in the product demo, he built three of them during the session!
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.