Recently at Splunk Digital Experience Monitoring Engineering we’ve started an initiative to increase visibility into our team. We wanted a way to display our KPIs (key performance indicators) for all to see. We created an engineering dashboard using a Raspberry Pi, Dashing, and Heroku.
Dashing is an open source Ruby + CoffeeScript framework created by the team at Shopify for creating dashboards. There are many pre-made widgets already available, and custom widgets are easily built to consume data from APIs.
We knew it would look great on the 50” wall-mounted TV located in our engineering space and visible to everyone on the Splunk Digital Experience Monitoring team. A Google search of “raspberry pi dashing” returns thousands of results, so the setup was pretty easy.
What Our Internal Dashboard Monitors
- GitHub branches and pull requests with build status of each project and branch from Semaphore
- Sprint progress (number of points closed vs total number of points, number of sprints closed vs goal closings per quarter)
- Number of days since last deploy
- Unacknowledged alerts from OpsGenie
- Lifetime of oldest jobs in all of our worker queues
- RSS feed to Cloud Host Status
Branches, Pull Requests, Build Statuses, and days since last deploy
Continuous integration is a battle for many teams, and ours is not exempt. We are always pushing for better code coverage, getting things green, and merging in branches as soon as we can so they don’t go stale.
Projects with broken builds will turn red, serving as a constant reminder that something needs to be fixed. The number of days since last deploy will turn yellow after 3 days and red after 6 days (encourages more frequent deploys and having fewer changes per deploy)
After many iterations of our development process, we’ve found that one week sprints work best for us. Our sprints go from Wednesday to Wednesday, with planning done on Tuesdays. We end each sprint with a show and tell of what we’ve done in the past week.
On the dashboard we display our current number of points closed vs points planned on the current sprint (points estimate the complexity of a task) as a percentage, as well as the number of sprints we have completely closed for the quarter.
The goal is not to punish ourselves for missing sprint deadlines, but to encourage ourselves to estimate and plan accurately (not taking on more than we can realistically do).
While most of the dashboard has a green or blue color (informational squares are blue, status squares are green/red), the “unacknowledged alerts” square pulls data from OpsGenie and turns red when there are alerts. This allows us (and others!) to easily see if there is something that needs attention.
Lifetime of Oldest Job
A quick measurement of worker health is the number of items in its queue, as well as the lifetime of the oldest job in that queue. We use this number to get a very rough idea of what is going on for that product. Future plans are to include the number of jobs processed per unit of time (job rate), as well as the ratio of job errors to completed jobs.
Cloud Host Status
Sometimes things aren’t our fault, but they are still our problem. We use our cloud host status aggregator to get a high-level overview of how cloud providers are doing. If any issues are publicly announced, we display that information straight to our dashboard.
After using the dashboard for only a few weeks, we already feel like we’ve made vast improvements to our engineering team. We have begun to hold ourselves to a higher standard because this data is viewable by everyone on the Splunk Digital Experience Monitoring team. This dashboard will be an ongoing project that will continue to change as our team evolves.
We are, at the core, a performance company. We believe that measuring performance and holding teams accountable should be a top priority for any organization.