The waves of change are certainly upon us and businesses are being forced to adapt at a record pace. Current world events have caused a jarring shift in all aspects of our lives, accelerating major changes in how we live and work. An unprecedented number of people are now working from home. Those of us working in IT Operations are no exception. Many companies are implementing a Distributed IT Operations Center (D-NOC) approach to address this new reality. For most businesses this was a tactical decision that under other circumstances would have taken detailed forethought and strategic planning.
“You can’t stop the waves, but you can learn to surf.” – Jon Kabat-Zinn
Today’s forced acceleration of change has an impact on many facets of the IT Operations Center. I would like to focus on three important dimensions:
- Visibility and Transparency
- Actionable Analytics
- Effective Collaboration and Communication
Although these three dimensions are already relevant in IT Operations, the rapid shift towards a distributed model exacerbates inefficiencies in old approaches and exposes gaps.
Visibility and Transparency
Support Engineers need End-to-End visibility into all relevant data in order to be as effective, efficient and as 'quick to resolve problems' as possible. Separate teams and tools create "silos". IT personnel in large corporations surely know the phrase, "You've been Siloed!". Each team has their tools and their responsibilities and their data: it is organizational myopia. The problem with this approach is that efficient operations and problem management requires spanning these silos, and working from home only makes this worse. If we share data in a secure and accessible way, collaboration and insights become easier across space and time. This is critical for the D-NOC.
Coupling end-to-end visibility with a complete and modern data platform enables rapid and thorough root-cause analysis, problem solving, and the possibility for predictive, proactive monitoring regardless of where we are working or what device we are using.
Now that we have "all the data", we must be able to analyze it and take action. However, we have a new challenge. The coveted "End-to-End Visibility" comes with a price: the management of potentially large or even massive volumes of data. Large data volumes also greatly amplify the noise that hides the real issues. Mismanaged, this can rapidly overwhelm IT Operations.
End-to-end visibility also requires that we are able to analyze the data in context so that we may accurately correlate it and easily make it actionable. Further still, we should be able to predict and avoid outages which would reduce the strain on the D-NOC. Appropriately applied analytics and machine learning are force multipliers. They perform as additional team members and they never rest.
All that to say... It takes a robust data management platform that can ingest, compress, and store your data securely so that relevant and modern analytics may be applied at scale, making your data actionable and valuable to your D-NOC.
Once Operations data is made actionable across teams and silos, there is a human element to consider. How do distributed teams work together to deliver efficient IT operations?
Effective Collaboration and Communication
The sense of urgency when a colleague arrives at our desk with a problem or the ability for teammates and managers to read unspoken social cues of stress or excitement remain challenges for the D-NOC.
It can be extremely difficult for remote workers under pressure who can no longer quickly and easily integrate by “sneakernet”. Merely replacing sneakers with chat tools does not solve the problem completely. Contextual data involved in alerts and incidents, knowing who is working on what part of a problem and the ability to drill into unforeseen paths of investigation are all critical for efficient Operations.
In our digital world where systems are required to be up 24x7, it’s important to keep the people supporting these services in mind. Software engineers and IT professionals know the pains of being on-call all too well. What is needed to support a totally distributed team, especially in this suddenly new D-NOC environment? DevOps teams, developers, sysadmins and IT analysts have always had to cope with alerts from multiple services, systems and applications, but not uniquely from their homes.
Incident responders are working in an evolving, complex and stressful remote environment. They must balance the demands of being sequestered at home coupled with the responsibilities of uptime and application performance. How does a team balance IT reliability with the potential of burnout and alert fatigue? Mostly red dashboards, constant alerts, deep incident queues and high noise environments are not optimized for the most important element in IT Operations: humans. Effective collaboration and communication go beyond the traditional tools used in an IT Operations Center or by your technical support engineers.
We need the ability to easily escalate problems and automatically route alerts to the right resources so that software engineers and IT operations teams can work cross-functionally to triage, respond to and remediate incidents quickly. Today's D-NOC requires the ability to tightly integrate the chosen suite of collaboration tools (e.g. Slack, MS Teams, Zoom) with the monitoring and actionable analytics discussed earlier so the critical incident context is maintained.
Surviving and Thriving in a D-NOC World
Whether your IT Operations Center has already successfully implemented a D-NOC approach or you are in the middle of an accelerated project to do so, we at Splunk are here to help with patterns, best practices, solutions and skills. If you are looking for inspiration on thriving in a D-NOC world, check out our case study on how Blue Sentry built a modern incident response capability in a distributed context.
Splunk is focused on developing and providing solutions that allow you to optimise your IT Operations no matter what approach you take. We’d encourage you to read more here on how we bring data and automation to help your engineers who are now working remotely and how Splunk brings a uniquely analytical approach to incident management.
Continue to follow Splunk Blogs for more resources on how you can evolve and improve your IT Operations and D-NOC capabilities. As COVID-19 continues to affect the world, Splunk is focused on supporting our customers, partners, employees, families and communities — ensuring we continue to remove the barriers between data and action amid this time of uncertainty.