Monitoring Docker at Scale with SignalFx

Over the course of the past two years, Docker has revolutionized how distributed applications are packaged, managed and deployed by making containers more accessible, easy to build, manage and execute. When we started building the SignalFx platform, using Docker was a no-brainer, providing a unifying technology across our various environments for development, testing and production.

We are extremely happy with our early adoption of Docker as a key technology in our build, deployment and application management systems. Along the way, we have contributed back bug fixes and feedback to the Docker project, co-maintain the docker-py project (Python client library for Docker) and created MaestroNG, a Docker container orchestration tool for multi-host environments.

At SignalFx, we of course use SignalFx to monitor our own infrastructure and applications. Of the various elements of our stack, monitoring our Docker containers was the last remaining component we couldn’t get metrics from. This all changed with the 1.5 release of Docker and its stats API.

The Docker Stats and Resource Usage API

Docker’s /stats API exposes a stream of metrics about a specified running container. Every second, the Docker daemon sends down the HTTP connection a JSON object with the latest values for a series of metrics about the container that Docker tracks: CPU, memory, network I/O usage and block device usage.

Here’s a stripped-down example; for the complete JSON output, see the documentation of the stats API.

   "blkio_stats": {
       "io_service_bytes_recursive": [
           { "major": 8, "minor": 0, "op": "Read",  "value": 238698496 },
           { "major": 8, "minor": 0, "op": "Write", "value": 5234688 },
           { "major": 8, "minor": 0, "op": "Sync",  "value": 5230592 },
           { "major": 8, "minor": 0, "op": "Async", "value": 238702592 },
           { "major": 8, "minor": 0, "op": "Total", "value": 243933184 }
   "cpu_stats": {
       "cpu_usage": {
           "percpu_usage": [
           "total_usage": 120445182045,
           "usage_in_kernelmode": 18000000000,
           "usage_in_usermode": 30740000000
       "system_cpu_usage": 57260280000000
   "memory_stats": {
       "limit": 8373014528,
       "max_usage": 866680832,
       "stats": { ... },
       "usage": 858525696
   "network": {
       "rx_bytes": 8981142,
       "rx_dropped": 0,
       "rx_errors": 0,
       "rx_packets": 115713,
       "tx_bytes": 6132481,
       "tx_dropped": 0,
       "tx_errors": 0,
       "tx_packets": 61151
   "read": "2015-06-04T20:44:59.901797792Z"

All we had to do was to capture this output for all our running containers, and stream it to SignalFx! Since we already use CollectD on all our servers, the easiest integration path was to use a CollectD plugin that would, on each host, gather and send the metrics of the containers running on that host.

Bringing it Together with CollectD

By the time we started looking at integrating Docker container statistics with SignalFx, Docker 1.5 had already been out for a couple of weeks. Surely, someone in the community must have started working on a CollectD plugin. As it turned out, Sylvain Baubeau did! His work gave us a good starting point for a CollectD plugin, implemented in Python, that would read Docker’s /stats output via docker-py.

We ran into some issues ourselves when getting stats for containers at scale having to do with CollectD’s polling interval and the time it took to return the first JSON block of data from the Docker’s /stats endpoint. So we dug into the code and contributed a bunch of improvements over a couple of pull requests—which were all merged (here and here)! The plugin is ready for general consumption and available at or

Installing the plugin is very quick and very simple, allowing you to be up and collecting your Docker container statistics in no time.

Monitoring Your Containers with SignalFx

As soon as the docker-collectd-plugin is installed, and assuming your CollectD is configured to send metrics to SignalFx, it will start reporting container metrics for the running containers on that host, with any new container being picked up automatically. Once you log into SignalFx, you should see a new Docker page that was automatically created with a dashboard showing your container statistics, all in real time!

Here are some of the metrics reported by the plugin that you’ll find in SignalFx and may find useful to monitor your containers:

  • CPU:
    • jiffies of CPU time used by the container
    • cpu.usage.kernelmode: in kernelspace code
    • cpu.usage.usermode: in userspace code
    • cpu.percpu.usage.cpuX: usage per core, where X is the core number
  • Network:
    • network.usage.rx_bytes: bytes received
    • network.usage.rx_dropped: incoming packets dropped
    • network.usage.rx_errors: incoming packet errors
    • network.usage.rx_packets: incoming packets received
    • network.usage.tx_bytes: bytes sent
    • network.usage.tx_dropped: transmitted packets dropped
    • network.usage.tx_errors: transmitted packet errors
    • network.usage.tx_packets: transmitted packets
  • Memory:
    • memory.usage.limit: memory limit of the container
    • memory.usage.max: maximum memory usage seen in the life of the container
    • current memory usage of the container
  • Block device I/O:
    • bytes read from block device (MAJOR, MINOR)
    • blkio.io_service_bytes_recursive-MAJOR-MINOR.write: bytes written to block device (MAJOR, MINOR)

All the data reported by the docker-collectd-plugin is automatically tied to the originating server via the host dimension, and all time series will have a plugin:docker dimension. Additionally, the container name is sent as the plugin_instance dimension. For examples of how to use these metrics, don’t hesitate to look at how the charts of the Docker Containers dashboard are built!

We’re very excited to join the Docker Ecosystem Technology Partner (ETP) program today and bring the power of SignalFx’s advanced monitoring and streaming analytics capabilities to all Docker users.

Join us for a free webinar on Operationalizing Docker with monitoring »

Maxime has been a software engineer for over 15 years. At SignalFx, Max is the architect behind our Microservices APM offering, and spent several years working on the core of SignalFx: its real-time, streaming SignalFlow™ Analytics. He is also the creator of MaestroNG, a container orchestrator for Docker environments.