This week we’re on the East Coast enjoying some fantastic customer presentations and roundtables at Splunk Live events in New York City, Princeton NJ and Washington DC. It’s Tuesday and we have more than 100 customers and Splunk users attending Splunk Live in midtown Manhattan. The vibe is electric as we’re being treated to awesome talks by IDT and New York Life. At lunch, long-term customer’s Bloomberg and AT&T joined the customer roundtable conversation.
Gabe Arnett, Senior Software Architect at Moody’s demonstrated how Splunk is being used to monitor and troubleshoot the Moody’s Analytics platform. Gabe has more than 15 years of building web applications in financial services, investment banking and e-Commerce. At Moody’s he’s responsible for global development team that develops and supports the newly re-designed client facing website – v3.moodys.com. Moody’s is a leading provider of research, data, analytic tools and related services to debt capital markets and credit risk management professionals. The company’s products and services provide the means to assess and manage the credit risk of individual exposures as well as portfolios; price and value holdings of debt instruments; analyze macroeconomic trends; and enhance customers’ risk management skills and practices.
Moody’s Splunk environment is utilized by 25 different users and runs on Windows 2003. Splunk provides Gabe’s developers secure access to the logs they need without touching the production devices, servers and applications. His team has built custom searches and a number of dashboards indicating the general health of their applications and service. Custom searches and alerts provide alerts to track errors and access – guaranteeing good user experience. The team also uses Splunk to understand when and where new content isn’t flowing to the v3 platform. A large part of the Moody’s user experience is delivering email alerts and Splunk helps the team track GUIDs to ensure customers receive the alerts they’ve subscribed to.
The team recently migrated from Splunk 3 to Splunk 4 – taking 30 minutes to perform the upgrade. The Splunk for Windows App has been significantly revamped in Splunk 4 and the Moody’s team is making use of it to monitor through WMI local server resources (disk, memory, networking) and correlate this performance data with the Windows and Application event logs.
Shay Benjamin, CSO and SVP, Architecture at IDTdesigns and implements network architectures and manages compliance, security and fraud initiatives at IDT. IDT Corporation (www.idt.net) is a holding company focused on the telecommunications and energy industries. Since 1995 they’ve been building hundreds of VOIP switches globally and assembling an international fiber optic network. IDT pioneered VOIP (Voice over Internet Protocol) to create Net2Phone, piloted the first commercial WiFi phone service in the US and has created a prepaid calling card business, which sells 12 million calling cards a month.
IDT uses Splunk primary for VOIP Call Detail Records (CDRs). The company indexes more than 120 million CDRs per day with six mirrored Splunk server instances. Call Detail Records (CDRs) are somewhat like logs, but with many fixed delimited fields . One or more CDRs are created at each switching or routing point for every VOIP call. CDRs vary between platform devices in number of fields and contents and unlike logs, few CDR fields contain easy-to-read key=value pairs. Although a key piece of maintaining service quality, billing, monitoring network quality and security forensics, working with CDRs is labor intensive and delay wastes labor, time and money.
IDT needs fast searches across all fields of the CDRs and quick data loading – to allow fast retrieval of call data and cross platform searches to unify results from different CDR formats. Historically IDT utilized a custom RDBMS solution with an application called Call Genius. In their RDBMS IDT was forced to limit the fields that get indexed because indexing of CDRs with an RDBMS is costly as it takes up a lot of space and slows load times. The RDBMS also only indexes fields common to multiple platform’s CDRs. In the RDBMS solution much of the CDR data was put into BLOBs (actually CLOBS) – multiple CDR fields mapped into a single RDBMS field to try and achieve efficiency. But Blobs can be very difficult to search and are difficult to index effectively. The legacy Call Genius application didn’t permit the search of CDR BLOBS.
Now IDT utilizes Splunk to index all CDR fields. No need to decide what fields to index and cross platform searches are easy without losing specific platform CDR format resolution. There is no longer a need to create BLOBs for efficiency. Engineers and support staff are able to quickly search for any combination of
- Phone Number
- IP address
- Trunk Group Name
Splunk naturally and easily links search terms across fields and the users just need to enter the phone number or IP and get back the CDR events and transactions.
Comparing Splunk to the RDBMS solution IDT found searches to be 50 to 100x faster on non-indexed RDBMS data. Indexed fields are also faster in Splunk than in the previous RDMBS solution. Splunk load times for a typical sample average 1 to 5 minutes versus the 20-40 minutes for the RDBMS.
IDT is in the process of feeding firewall, security, router, IP network, and switch data in into Splunk as well. They’re already discovering Splunk is finding errors not captured by Network Management Consoles and has provided valuable troubleshooting during recent datacenter migrations.
Most of all IDT is looking forward to discovering new ways to use all the data in Splunk. Heuristic analysis and Business intelligence applications are on the top of their list including the use of Splunk to find human “Family and Friends” networks and drive the development of new commercial programs.
New York Life Insurance wrapped up the morning session presentations with Aaron Zachko, Assistant Vice President of Information Systems. New York Life’s family of companies offers life insurance, retirement income, investments and long-term care insurance. New York Life Investments provides institutional asset management and retirement plan services. The company has the highest possible financial strength ratings from all four of the major credit rating agencies.
Aaron is a senior network architect and leads the group responsible for network management, core network infrastructure and network security infrastructure. The New York Life network consists of hundreds of Cisco routers, switches, firewalls, enterprise DHCP and Network Access Control (NAC) devices. The company chose Splunk to satisfy audit and compliance requirements and support the rollout of their NAC infrastructure earlier this year. Currently the team is expanding its use of Splunk into enterprise security forensics and as a multi component-monitoring compliment to their Enterprise Service Management Platform which seems to have one of every kind of monitoring tool already.
Thousands of users a day go through NAC to access the New York Life network and Aaron’s team needed visibility into the network from a unified infrastructure and services perspective. They use Splunk to monitor failed login events and transactions and unauthorized devices on the network globally. The NAC rollout team has been able to stay in front of issues – identifying them before end users discover the problems. Their custom Splunk dashboards enable the team to easily see trends and spikes in activity across all networking components.
Operations teams at New York Life have more recently been using Splunk to troubleshoot Application issues.
An application issue across multiple servers created more than 9M events across 167 different sources. Manual investigation into this kind of problem would have taken days — an extremely complex and time consuming effort. Splunk found the issue in 3 minutes. Now teams can trace transactions across systems in minutes or seconds vs. hours or days. And all without any new instrumentation – just using the artifacts they already had.
New York Life is discovering what many other Splunk users have too. Enterprise monitoring and service management platforms can tell you something is wrong but Splunk will help you figure out why and where to fix it.