PSCU Safeguards Reliability, Security Through VictorOps and Splunk Enterprise
PSCU is the nation’s premier payments credit union service organization, supporting more than 900 owner credit unions representing over two billion annual transactions. To better enable its credit unions to compete with banks, PSCU aimed to improve key IT performance metrics. Using Splunk Enterprise and VictorOps, PSCU has seen benefits including:
- Reductions in mean time to acknowledge (MTTA), from four hours to less than two minutes
- Stronger call-team accountability through “single pane of glass” monitoring visibility
- More efficient security monitoring for PCI compliance
- Cost-efficient use case expansion covering enterprise operations
SPLUNK USE CASES
- Ensure product and service availability for credit unions
- Reduce MTTA and MTTR
- Aggregate disparate alerts under “single pane of glass”
- Drive greater accountability for meeting on-call responsibilities
- Protect data security, PCI compliance
- Accelerates MTTA from four hours to less than two minutes
- Enables collaboration, accountability across multiple departments
- Empowers staff with mobile monitoring access to deliver support from anywhere
- Ensures PCI security compliance for both activities and transactions
- Enables 900 credit unions to conduct two billion annual transactions
- Delivers an excellent customer experience
- Symantec security logs
- Application performance management logs
- New Relic
- Oracle Enterprise Manager
- Network devices
The PSCU advantage
As member-owned, not-for-profit financial cooperatives, credit unions exist to serve their communities. They compete with banks by offering attractive services and rates.
Here’s where PSCU comes in. Most credit unions do not have the resources to build and host their own products, so PSCU does it for them. PSCU delivers white-label applications for online bill pay, online lending, debit and credit card programs and other financial services
“It is critical that our services and products are available for our credit union owners,” says Earl Diem, PSCU IT operations manager.
Challenge: improve MTTA/MTTR
PSCU saw the value in reducing MTTA — an acknowledgment that in effect says “I’m on it” when an alert is received. MTTA is a key metric for reducing downtime because it triggers incident response — which lowers mean time to repair (MTTR).
“Our people were doing the ‘rotating chair’ methodology of support, using several point tools to monitor five or six disparate systems. We recognized the need for a better alternative to give us the MTTA we sought,” Diem recalls. “We wanted to aggregate system-based alerts and gain additional traceability to more effectively manage staff accountability.”
VictorOps slashes MTTA
PSCU solved its MTTA, MTTR and accountability challenges with VictorOps, which empowers on-call teams to find and fix problems faster with automated and insightful incident management routing, collaboration and reviews. PSCU employs VictorOps as a standard solution across 110 enterprise users. Diem keeps a graph on his wall of plummeting MTTA since PSCU started using VictorOps more than three years ago.
“In 12 months with VictorOps, our mean time to acknowledge came down from four hours to 20 minutes. Now we’re three years in and we’re under two minutes,” Diem says. “Each PSCU IT department maintains an on-call schedule. VictorOps brought all the managers together with one tool. We understand what we’re doing, and we all use the same escalation schedule. It drives accountability.”
Staff members use VictorOps mobility features to perform their support jobs from anywhere. “You can interact with the system from your desktop, from a laptop, from an iPad, through your phone,” Diem says. “The alerts in VictorOps give you the supporting data from the system (that) alerted. You know what went wrong even before you look at the system.”
PSCU started using VictorOps for its production environment but has extended it also to Quality Assurance and DevOps. The organization employs offshore developers in the Asia-Pacific region and India. It cannot allow system issues to interfere with productivity. Now, PSCU detects performance degradations before they turn into failures.
“In 12 months with VictorOps, our mean time to acknowledge came down from four hours to 20 minutes. Now we’re three years in and we’re under two minutes."
Earl Diem, IT Operations Manager
Extending history of success with Splunk
Today PSCU has another reason to celebrate: the acquisition of VictorOps by Splunk. PSCU has long been a Splunk Enterprise customer, starting with security monitoring and Payment Card Industry (PCI) compliance — a must for financial services. PSCU’s security team uses Splunk Enterprise to aggregate and index logs from tools monitoring network and security devices. Now, PSCU has decided to push its operational logs into Splunk Enterprise also.
Because PSCU already relied on Splunk Enterprise for PCI monitoring, “It didn’t make financial sense to maintain a separate tool for operations when Splunk can serve the whole enterprise,” Diem says.
Splunk’s machine data analytics, combined with incident response from VictorOps, creates a “Platform of Engagement” that helps DevOps teams innovate faster for better customer experiences. Outstanding vendor support is another advantage, as PSCU’s relationship with Splunk brings an active user community and educational resources.
“I’m pretty excited about VictorOps being a part of Splunk,” Diem says.
PSCU is expanding its reliance on the Splunk platform with new use cases. One issue has been delays in detecting errors in new software releases — a problem solved by VictorOps and the Splunk App for Infrastructure.
“The errors we’re not currently seeing will bubble up, alerting into VictorOps as warnings, and we’ll have a team investigate,” Diem says. “The next natural progression after that would be Splunk IT Service Intelligence for predictive insight.”
The combination of Splunk and VictorOps software gives PSCU a powerful means to fulfill its mission of satisfying customers. “No matter what you do, you’re going to have failures out there,” Diem says. “The sooner you know, the sooner you can repair it, and the better you protect your user experience.