The Bulkhead and Sidecar Design Patterns for Microservices & Incident Resolution

There are a lot of modern design patterns for microservices-based applications. Two design patterns that I’m interested in right now, from the perspective of how they support application support, are the bulkhead and sidecar design patterns — let’s take a look.

(Looking for more developer resources? Check out these DevOps conferences.)

Coding apps for incidents

How you build your application absolutely impacts the lives of those in charge of supporting it. This isn’t a correlation we often make — but thinking about what happens when things break as you build your application will help everyone.

Developers should be thinking about ways they can improve incident management and response through code, especially because more and more developers are on-call. When considering ways in which your application assists in incident response and remediation, here are the attributes you’re looking for:

  • Does this design pattern give more context?
  • Does this design pattern make resolution easier?
  • Does this design pattern improve my ability to communicate issues?
  • Does this design pattern improve my ability to bring in experts?

Now let’s turn to two patterns and show how they can support incident response and remediation.

The Bulkhead design pattern

The bulkhead pattern seeks to isolate applications and services into pools of resources. Such isolation allows some amount of failure to exist without bringing down the entire application or creating cascading issues — particularly useful in incident management.

Bulkhead design patterns have the inherent benefit of making services easier to understand and decompose team wide. Fortunately, this pattern comes with additional benefits related to supporting the application in production:

Isolation = more context

The isolation provides more context in the alert payload to better pinpoint the issue. It also allows responders to address issues in a way that does not impact the reliability and uptime of other functionality in the application.

Involving humans is easier

If a human needs to be in the loop, pools can equate to teams, and teams generally serve as buckets for subject matter experts and alert destinations. So, the pools can assist in determining:

  • Who/what to alert
  • Who might be an expert to add as a responder

This is especially true when developers are expected to be on-call for their code. When there’s a failure, whoever is paged can use the pool as an indicator for which currently on-call developers are relevant to address the issue versus reaching out to anyone tagged as a backend or frontend developer.


The Sidecar design pattern

When I imagine the sidecar pattern, I think of something a little more parasitic. This pattern is a great way to keep complementary components attached logically — but technically separate. This offers a variety of advantages for application support.

Primary function stays up

The sidecar pattern prevents service-related code from taking down the primary function of the service itself. This allows for rollbacks on the service (or the service-related code) independently so as to not impact each other.

However, it gives the service and companion app a direct connection to make it easier to consume I/O from one another. For example, perhaps a service has a platform extraction layer entry point. This layer could be separated such that, if the layer becomes unresponsive due to high-load, users of the service aren’t directly impacted.

Sidecar can monitor tools

Besides abstractions, sidecar is often used for monitoring tooling for that service. This has the benefit where issues with a monitoring tool cannot impact application functionality in the service they’re attached to. But, it also gives incident responders the benefit of not losing access to data from the monitoring tool if the services do come down.

This is a large shift from most architecture, where even monitoring for microservices applications is done in a monolithic way. Monitoring tools also often have agents and can benefit from being tightly coupled with the application. You don’t want those agents to bring down the service if they have an issue. But, you do want there to be data coming off the service and accessible as long as the service can produce it — even if it’s not functioning.

(Understand the four golden metrics of monitoring.)

Logical & technical isolation

And, like the bulkhead pattern, the isolation is both logical and technical. The logical benefit is that responders can stay focused on where the issue occurs and better bring in support where it’s needed; especially in cases where developers are on-call and need deeper access to monitoring tooling than would normally be the case for the broader application.

More design patterns

The list of modern design patterns is increasing so rapidly, it’s hard to keep up. Many others have direct correlation to supporting the application and addressing failures automatically or manually. I’m a big fan of the sidecar and bulkhead pattern as tools to improve application production support.

What is Splunk?

This article was written by Chris Riley. Chris is a technologist and DevOps advocate for Splunk who has spent more than a decade helping organizations transition from traditional development practices to a modern set of culture, processes and tooling.

This posting does not necessarily represent Splunk's position, strategies or opinion.

Stephen Watts
Posted by

Stephen Watts

Stephen Watts works in growth marketing at Splunk. Stephen holds a degree in Philosophy from Auburn University and is an MSIS candidate at UC Denver. He contributes to a variety of publications including CIO.com, Search Engine Journal, ITSM.Tools, IT Chronicles, DZone, and CompTIA.