The SignalFx team just got back from Seattle after an amazing week at DockerCon. We spent three days learning about advancements in container orchestration, networking, security, and administration and Bumping Up with thousands of other friendly members of the extended Docker family. This was our third DockerCon, and we can’t believe how much the community has grown in just two years.
The best part of DockerCon, of course, is hearing about all the use cases for ephemeral architecture and how monitoring plays into everyone’s production strategies. Our top takeaway after hundreds of conversations was that a successful Docker strategy is a function of a well-conceived and thoughtfully-executed microservices regime. For most teams, operationalizing containers is less an end-goal than a function of moving decision-making closer to production.
One of our favorite presentations came from Capital One, who focused on defining and monitoring microservices. The speaker emphasized the dangers of a too-stringently-defined single responsibility principle. Although a good microservice does one thing well, owns its own data, and has independence such that it can be managed and updated discretely, microservices also have relevant couplings and are more like a complex web of interdependencies than a series of free-standing silos. Separating microservices too completely would cause unwelcome chatter and likely introduce unwelcome latency and resource demands. In any microservices architecture, monitoring plays a key role in ensuring that the services in production aren’t responding to changes with a steady decline. In addition to relying on log data to explain a problem after-the-fact, an analytical approach to aggregating metrics and alerting against patterns and rates-of-change can save a services-oriented architecture from encountering the types of problems that containers set out to help prevent in the first place.
Another key theme throughout the presentations was Docker’s burgeoning capabilities in support of data volume transference as a stateful service. When a container drops or dies, a new container that takes its place, most likely in another location, has to be identifiable by the data volume it contains. Storage is growing increasingly distributed, and it’s becoming a requirement to not only be able to find those volumes in different containers, but also to closely monitor and alert on Docker performance, capacity, and availability like any other stateful service, not be limited to host status notifications, which may come too late.
A third favorite presentation came from ADP, an $11 billion/year company moving to a DevOps regime in order to ship faster and respond more aggressively to customer needs. Like many large enterprises, their architecture is still largely monolithic. They’re using Docker to make it easier to manage and deploy parts of their apps that change most frequently and need both flexibility and structure. The metaphor ADP uses is that microservices are chicken nuggets, but they’re starting off with a whole chicken. Having a hybrid strategy saves the disruption of a rushed transition and allows them to pick up speed where it matters most, rather than living under the constraints of dependencies for every part of their applications.
Along with those business cases, we enjoyed getting a chance to share our own experience running a Docker infrastructure at scale for the past few years. SignalFx’s own Docker expert, Maxime Petazzoni, started the week by publishing a guide to getting started with monitoring for containers and why we love collectd. We ran into loads of SignalFx users (and some future-users) who credit collectd with their ability to not only scale their monitoring strategy but also help set the path towards operationalizing their microservices strategy. Max outlined four key questions to ask about your Docker and microservices objectives and gave guidance on how the answers determine how to not only monitor but operate your containerized environment as you set goals for production.
- Do you want to track application-specific metrics or just system-level metrics?
- Is your application placement static or dynamic? (i.e., Do you use a static mapping of what runs where or do you use dynamic container placement, scheduling, and binpacking?)
- If you have application-specific metrics, do you poll those metrics from your application, or are they being pushed to some external endpoint? If you poll the metrics, are they available through a TCP port you’re comfortable exposing from your container?
- Do you run lightweight, barebone, single-process Docker containers or heavyweight images with supervisord (or something similar)?
Best of all, we got to share best practices for monitoring Docker in production at SignalFx today. We use SignalFx’s Host & Container Navigator to get instant visibility into the status of all our Docker containers with a real-time and continuous survey of container status across our environment. From the Host & Container Navigator view, we can drill into different perspectives of our environment by specific availability zone, plugin, or microservice, for example. This gives us a starting point to quickly determine where our attention may be required. Finally, the recent release of the SignalFx Insights feature allows us to explore correlations between metrics and dimensions of all the system- and application-level data flowing out of our system. We can hone in on one infrastructure metric, such as high memory utilization, and see whether there are more common dimensions related to that group of select containers.
Even though DockerCon is over, we’re planning to keep the discussion going. We love working with the Docker team as an Ecosystem Technology Partner Program, and we’re eager to hear more from the community about what’s required to make cloud monitoring and intelligent alerting a fundamental part of your production strategy for Docker. To learn more, check out our webinar with Zenefits on operating Docker and orchestrating microservices. Max talks about his experience operating Docker at scale in a high-performance environment.