Chapter 12 | SRE Conclusions & Next Steps

Conclusion

Designing, building, operating, and improving the VictorOps Software as a service, including the underlying physical and virtual infrastructure is critical to the future of the business. Exploring new methods of both delivering quality software sooner and receiving feedback from real usage faster is our quest.

Continuing to develop better methods of delivering software as a service to meet the changing needs of our user base will be a constant journey. We expect our own internal SRE efforts and philosophies to change dramatically, even by this time next year. The path we took provided results for us. They may not for others, especially large enterprise organizations distributed globally. The organizational structure and culture are strong contributing factors to the success and failure of these types of changes. As we grow, modifications to our processes, our technology, and even people will be constantly evolving. In fact, in just the last few months, changes to our SRE concern submission process has changed.

Making Changes and Improvements

The formal process of submitting SRE issues has already evolved. Now, engineers are identifying concerns on their own and taking the initiative to begin establishing observability around them, as well as implementing service level indicators and objectives. Conversations around concerns still take place in council meetings but the process of vetting and creating epics has shifted to the teams. They are empowered and responsible for identifying concerns, instrumenting visibility, and prioritizing the work on their own.

Company-wide minor improvements are taking place simply by asking questions and having more conversations about reliability. Dozens of large TVs have now popped up all over the office sharing anything from the current health of the system to what work is in flight or coming up related to SRE efforts and more. Links to various dashboards are passed around in our group chat rooms. Awareness is amplified and individuals are empowered to implement improvements, especially if they help to explore more “unknown unknowns”.

Just by having more conversations about reliability and creating more visibility, more confidence is generated on how things work, or at least how we think they work versus, often times surprising, reality. This, in turn, creates more questions. And so on. If you recall, our SRE journey, much like others before us, started by asking questions. There’s always more to discover. In complex systems, things are always changing. You can never know all of it.

Your Journey

We hope that following the VictorOps SRE journey will spark questions for you and your teams. Who is your customer? What are you doing well (or not well) at enabling the customer? What keeps you up at night about the availability of your systems? Can you quickly and safely introduce changes to your system? Can you answer these questions?

Good luck on your journey towards SRE and always keep reaching, stretching, and exploring. That’s where the good stuff is!

              

Let us help you make on-call suck less. Get started now.