IT

Stop, Collaborate and Listen: A Three-Step Plan for Better IT Operations from Vanilla Ice

There are quite a few famous names in the history of information technology, from Charles Babbage and Ada Lovelace to Bill Gates, Linus Torvalds and Robert Matthew Van Winkle. If that last name sounds unfamiliar, you may know him better as Vanilla Ice. Not only did his semi-autobiographical ballad “Ice Ice Baby” forever change the world of music, but it also contained a timeless and vital message for IT departments everywhere: “Stop. Collaborate and listen.”

Let’s break it down, shall we?

Stop

Stop fighting in the war room! War rooms won’t always be necessary, thanks to advances like artificial intelligence for IT operations (AIOps), which predict and prevent the issues that lead to war rooms. For now, they’re still a reality, but they don’t have to be contentious. With the average critical application failure costing approximately $500,000 to $1 million per hour, the war room can be a highly emotional place of finger-pointing and defensiveness. People are trying to justify their actions and deflect blame when they should be working together to reduce Mean Time to Resolution (MTTR). Once you’ve stopped the fight, the next step is to collaborate.

Collaborate

Collaboration is valuable for lowering lag times by allowing teams to find and remediate the problem. More complex alerts may present as problems with the application, when in reality it may be the network. (“It’s always the network.”)

Unless you have a network operations center (NOC) commander driving alerts out, they just sit in the alert monitoring tool. Who’s on call? What department? What email alias should you use if you get an out-of-office message? It’s amazing how many companies rely on a list of phone numbers kept in a spreadsheet.

All of these roadblocks can drive up your Mean Time to Acknowledge (MTTA), which can be just as important as MTTR. An AIOps-enabled system lets on-call teams find and fix problems faster with automated incident management routing, collaboration and reviews.

PSCU, America’s premier payments credit union service organization, selected an AIOps-enabled system built on Splunk Enterprise and VictorOps. The result was a reduction in MTTA from four hours down to two minutes, taking them from “Nah, not my problem” to “I’m on it” in a fraction of the time it had previously taken. Or as Van Winkle would put it, “If there was a problem, yo, I’ll solve it.”

 

 


Listen

Van Winkle boasted of “cooking MC's like a pound of bacon.” While this may well be an appropriate reaction when encountering a sucker MC, in the aftermath of an outage, it’s vital to listen to your team. The IT labor market is tight. Keeping good people is important. Bad war rooms cause people to leave and look for kinder, gentler war rooms. Technologies that focus on collaboration foster a better experience, just like mobile technology improves the on-call experience.

Conducting post-incident reviews gives you an opportunity to calmly review the facts—like outage details and interactions between people and teams—to better assess how incidents happen, and, even more important, how they’re resolved.

Anything less than the best is a felony

An incident response system designed for developers, DevOps and operations teams can help you reduce outage time and add confidence to your high-speed DevOps delivery and operations. VictorOps takes alerts from your monitoring tools and applies on-call schedules and rules to engage the right teams and people so you can start resolving problems faster.

Once your team is in a “firefight”, VictorOps makes collaborating easier and faster by engaging the right experts and teams over a native mobile app or web interface. Analytics enable your team to provide better retrospectives so you can continuously improve your team’s incident response. Collaboration and analytics drive shorter outages, less waste in resources, improved utilization of your team’s “tribal knowledge” and a more empowering, collaborative and enjoyable on-call experience for your team.

You might even say that it helps you keep your composure when it’s time to get loose.

Do you want to stop, collaborate and listen?

Learn more about VictorOps and using a platform approach for IT Operations.

George Khoury
Posted by

George Khoury

George is Splunk's Product Marketing Manager and technical evangelist in APAC, responsible for communicating Splunk's go-to market strategy in the region. He works closely with customers to help them understand how machine data reveals new insights across application delivery, business analytics, IT operations, IoT, and security and compliance. With nearly 20 years in the IT industry working with large Enterprises, Manufacturing, Government and Banking sectors, George has extensive knowledge of enterprise IT systems.

TAGS
Show All Tags
Show Less Tags