Case Study: Splunk at Motorola
|
Splunk eliminated hours of manual analysis per ticket. We now have proactive visibility into our SOA infrastructure and integrated knowledge across our global team. - Mike Danley, Senior Manager, Foundational Services |
Solution Areas: Application Management, Change Management, Security, Server Management
The backbone can't fail
The Motorola Corporate IT team develops the integration components shared across Motorola's entire supply chain.
Before Splunk, the offshore global operations team supporting these components had to manually respond to trouble tickets from all Motorola groups and trading partners. Following up involved time consuming manual analysis of data accessed by logging onto multiple Unix hosts.
The lack of visibility sometimes complicated efforts to maintain critical infrastructure availability. Just one event stuck in an infrastructure component could have significant business impact.
And not only was it inefficient, this process caused security challenges.
Splunk eliminated hours of manual analysis per ticket, enabled proactive visibility into the infrastructure, and facilitated stronger security policy.
About Motorola
Motorola is a global communications leader, employing over 60,000 employees worldwide. Motorola Corporate IT builds and runs integration components shared by customers across Motorola's supply chain including Motorola business units, Motorola shared services, Motorola product groups and external trading partners.
Motorola is an integration pioneer, having deployed over 1,000 components since 1999. These components are deployed across 150 systems, some of which were the starting point for SOA across Motorola. Over 80 business applications use a common enterprise service bus (ESB).
Shared components include user provisioning, single sign-on, identity management and core directories. Motorola Corporate IT has some of the largest LDAP infrastructures in the world, processing over 250 million internal and external transactions per month.
Motorola Corporate IT capabilities also include:
- merger, acquisition and divestiture integrations
- real-time process monitoring of performance indicators (BAM)
- business process improvement and automation (BPM)
- application functional, performance, stress testing and monitoring (BAC)
Mike Danley has over 27 years of experience in IT including eBusiness technology architecture, middleware and integration platforms, B2Bi design, IT service architecture, and strategy and governance.
Challenges
Mike's key challenge is the integration space is "neural by nature." Meaning that new, disparate components are added constantly. It's a critical enabler of the supply chain and supports nearly every transaction.
But problems have an exponential impact and must be identified and resolved quickly.
Extreme availability requirements
Motorola has a corporate initiative called Business Live 365 to minimize downtime across all services. So problems must be found proactively and fixed immediately.
Scattered data
Finding and fixing problems was further frustrated by the fact that any transaction could cross dozens, even hundreds of hosts, and many layers of infrastructure.
Responding to a trouble ticket from a partner about a reported failed integration required manually grepping up to 75 files, taking several hours. If there was an underlying problem, the issue might impact other customers while the support team was still combing through data.
Knowledge retention
24/7 support is handled by an offshore team making it hard for the team to retain the knowledge necessary to perform complicated reconciliations and investigate failures across multiple tiers of infrastructure. It also created a vulnerability, as this team required direct access to production systems in order to access IT data for troubleshooting purposes.
Splunk at Motorola
Given these challenges, Mike brought Splunk in soon after hearing about it. He was amazed by how quickly it was up and running - within a day it was live with data flowing in real time from over 150 Solaris servers and a handful of Windows servers. Splunk effortlessly indexed dozens of data formats from different applications including Webmethods brokers, the Oracle database, FTP server logs and countless other data sources in Motorola's complex infrastructure.
"Within a week the support team was asking for more data sources in Splunk because it was so useful."
Trouble ticket response
The immediate and primary use of Splunk was for the Tier I support team to search across all the data sources in response to trouble tickets from Motorola's internal customers and supply chain partners. The first day it was running, the Tier 1 staff were able to get an answer with a single search instead of spending hours grepping through over 75 separate files on dozens of servers.
Lockdown
With Splunk in place, the team was able to lock down the file system and eliminate direct developer and Tier 1 support access to production filesystems. These users could now get complete visibility through Splunk's web interface in a completely secure, non-intrusive way.
Change verification
Motorola's outsourced partner SLA calls for tight change windows to avoid unplanned outages. With Splunk in place, the Tier III team began verifying that changes occurred within the authorized time window. In one case within the first week of using Splunk, they discovered a system was brought down 45 minutes outside the time window. Noticing and reacting to this SLA violation is critical to ensuring the availability of core components.
Transaction validation
Another frequent issue to cross the support team's desk is a request to prove a purchase order was executed. With Splunk, the team was able to create a canned search in a few minutes, shared across shifts, to prove an order was executed.
Proactive alerting
Both front line and senior staff have set up numerous alerts based on scheduled Splunk searches to proactively look for problems as new issues arise. For example, when the issue of out-of-window restarts arose, the team quickly saved and set rules on a search so they would be alerted whenever a broker was restarted. Improving monitoring is now a natural by-product of investigating issues.