DevOps is a well established discipline. By now, most developers, IT engineers and site reliability engineers (SREs) have heard all about the importance of “breaking down silos” and achieving seamless communication and collaboration across all stakeholders in the continuous integration/continuous delivery (CI/CD) process — which extends from source code development through production environment management and incident response.
Yet, many teams that want to achieve the efficiency of the DevOps model face the challenge of building collaboration pathways that actually connect different teams and different workflows efficiently. It’s one thing to talk about breaking down the silos. It’s another to deploy collaboration tools that actually do that.
This isn’t because the right collaboration tools aren’t out there. They are — but they’re difficult to integrate in a way that connects all stakeholders across all of their workflows.
Achieving better collaboration, then, means taking a more effective approach to tool integration. This blog post explains how. It starts by walking through the problem with the collaboration strategies and tools that the typical team relies on today — even teams that think they are doing DevOps. It then offers practical advice on bridging collaboration gaps by integrating service desks, ChatOps and incident response tools into a seamless, end-to-end communication pipeline.
The state of collaboration tooling
Again, there is no lack of collaboration tools out there. Most businesses have long had ticketing systems and service desks in place. They’ve been leveraging incident response tools since back when pagers were a part of the process. They have slick, real-time communication platforms like Slack and Microsoft Teams that offer plenty of opportunities for building automation into workflows.
Yet despite these rich collaboration tools, day-to-day communication for the various stakeholders in the CI/CD process remains a struggle for several reasons.
One is that these collaboration tools tend to exist in silos, which reinforce silos between teams. The service desk system typically belongs to IT. Developers might use Slack to plan their next feature set or release sprint, but they don’t often bring the IT team into their channels. Engineers assigned to incident response can use their organization’s incident response platform to manage alerts, triage issues and track response status, but if it’s not your turn to be on incident response duty, you’ll probably never look at that system or its data.
Some collaboration tools are designed for one-way communication. They push information down a pipeline rather than facilitating a conversation.
Take ticketing systems, for example. They’re great for recording when something goes wrong, tracking who is responding to it and measuring how long it takes to fix the problem. But if you actually need to collaborate with others on the issue, ticketing systems tend to be less efficient. You may be able to bring other engineers into the conversation, but ticketing is typically not the most convenient way to do that.
This is not a problem with the tools per se. You can use tools that are designed for one-way communication for conversational workflows, but it requires integrating the tools with external systems that facilitate a greater degree of collaboration.
Lack of data
In most cases, each communication tool is able to draw on a limited source of data by default. Your service desk can keep track of whichever data the IT team enters into it, for instance. Slack can record whichever conversations take place in it.
This means that these tools offer limited visibility into operations, which in turn constrains the team’s ability to collaborate. Maybe a conversation taking place in Slack needs to be contextualized by data from a development tool. Maybe incident response platforms should draw on log files that can help the on-call engineers determine which issues to prioritize.
All these things are possible. But it requires integrations because it’s not a built-in feature of the tools themselves.
Most communication tools are also manual by default. Although they often support automations, like the ability to send alerts from a monitoring tool directly to a Slack channel, those automations don’t happen automatically. Unless you take extra steps to integrate and automate your systems, you’re stuck with manual communication workflows, which hamper your team’s ability to operate efficiently and to bridge the communication gaps between developers, IT engineers and SREs.
Toward a better world: Collaboration that actually works
What does it take to fix the collaboration shortcomings that teams face today?
It’s not more tools. Once again, the tools are not the issue.
Instead, integrations among different categories of tools need to change. As we’ve hinted above, it’s only by tightly tying together various types of tools — like service desks, ticketing systems, ChatOps solutions and incident response platforms — that communication can become truly seamless.
Integration in practice
To work well, integration requires more than just allowing two tools to share data with each other. Making it possible for your release automation suite to send notices to Slack when an application release deploys is better than running these systems in total isolation, but on its own, it doesn’t necessarily offer actionable insight. Neither does setting up generic integrations to connect your log analysis tool to your incident response platform, or making it possible to view open tickets from your service desk through the incident response portal.
Effective integration requires going further by building integrations based on the following principles.
Data sharing and collaboration should be as automated as possible. No one should have to press a button to share important alerts between your service desk system and your incident response platform, for example. Nor should an engineer have to request information manually. When relevant information is available, it should flow from one system to another automatically.
Security and compliance
When you share data between tools, it can be easy to expose information in unintended ways. That’s why it’s crucial that tools adhere to security and compliance rules.
You don’t want sensitive information from an incident response process leaking into a publicly accessible Slack channel, for example. Likewise, you need an audit trail that allows you to determine how information was shared between systems and who had access to it.
Integrations should deliver information that is simple to access and use. If an engineer has to parse through a long list of alerts to figure out which ones are relevant or compare data from the service desk system and the incident response system to figure out which tickets correspond to which incidents, the integration is not very usable or efficient, and its value is limited.
Single source of truth
Ultimately, the goal of communication tool integration should be to ensure that the tools collectively form a single source of truth. Any stakeholder in the CI/CD process — developers, test engineers, IT engineers, SREs, security specialists and so on — should be able to understand the state of the CI/CD pipeline and what it means for their roles.
They should know when a performance testing failure or security issue has caused a release to be held back. They should understand which bugs have been identified in production, what the incident response team is doing to handle them and how developers are going to fix them. They should have the visibility they need to know when one part of the software development life cycle is not going well so that they can offer solutions.
Collaboration in practice: Integrating ChatOps, service desks and incident response
To illustrate what effective integrated communication and collaboration mean in practice, consider a collaboration tool set built around the following three pillars and tool categories:
- Incident response: Incident response tools offer immediate, real-time responsiveness. When they are properly integrated with other collaboration systems, they help your team identify the most pressing issues and understand their role in fixing them.
- Service desk: By their nature, service desks enhance compliance and provide a clear audit trail for tracing operational workflows after they are complete. They mitigate the risk of compliance oversights that could lead to leaked information or other security problems.
- ChatOps: By leveraging automation to help engineers share information more quickly, as well as identifying the information that is most relevant to them, ChatOps makes communication efficient and convenient.
These are the collaboration pillars at the foundation of Splunk On-Call. By enabling a ChatOps approach to incident response, Splunk On-Call makes workflows efficient. And, because Splunk On-Call integrates with a wide selection of external tools — ranging from development tools like Rollbar and Runscope to monitoring tools like CloudWatch and Prometheus to service management platforms like ServiceNow — it maximizes your team’s ability to gain complete context on the state of the CI/CD pipeline and the problems that need to be resolved. Take incident response out of a silo and integrate it seamlessly into the broader CI/CD workflow as part of Splunk Observability Cloud.
What is Splunk?
The original version of this blog was published by Bill Emmett. This posting does not necessarily represent Splunk's position, strategies, or opinion.