Standardizing through the Golden Tech framework
Standardizing technology involved a multifaceted approach. As part of the Golden Tech framework, Spotify built a tech radar tool defining a set of technology standards for everyone to build on, which they called golden technologies. Spotify also standardized how they build and deploy different types of software through golden paths, or self-paced guide books. To further enable and facilitate standardization, the team also implemented software templates within the Golden Tech framework that streamline creation of back-end services, pipelines, websites, or any other type of software.
Measuring progress and adoption
To ensure company-wide adoption of Golden Tech and unify its technology stack, Spotify developed a tool that measures the quality, reliability, and alignment of the software ecosystem. The tool incentivizes teams to reach certification levels by passing checks for their software, progressing toward the "golden state." The closer teams get, the more automated, free maintenance updates they receive.
Beyond individual teams, the tool provides department-wide visibility into overall tech health and certification levels. It also tracks their progress and how each department aligns with organizational goals.
This and all other tools — like the Spotify tech radar, software templates, and CI/CD plugins — are built as plugins for Backstage. Backstage is Spotify’s developer portal which serves as a central hub for tools, creating a single pane of glass for platform engineering solutions. This interoperability unlocks compound value across the organization by simplifying the developer experience.
Fleet Shift: Automating large-scale code changes
Once Spotify standardized and centralized its tech ecosystem, the next step was automating maintenance. Enter Fleet Shift, a tool designed to perform large-scale code changes across the fleet.
How Fleet Shift Works
Fleet Shift automates updates across repositories by executing shifts — scripts that modify code at scale. Here’s how it works:
- Engineers define a Shift, i.e. a set of instructions for code changes, and package it as a Docker container.
- Fleet Shift executes the Shift as a Kubernetes job, cloning repositories, applying transformations, and creating pull requests.
- Fleet Shift listens for errors or success. Automated checks ensure that changes are safe, and PRs can be merged automatically if CI/CD tests pass.
This automation has dramatically reduced migration timelines. For example, upgrading Spotify’s internal service framework used to take 200 days — Fleet Shift reduced this to less than seven days.
Technical requirements for fleet management
Achieving fleet management at scale requires a robust technical foundation that includes the following.
Declarative infrastructure
Spotify transitioned to declarative infrastructure to simplify configuration management across thousands of different services at scale.To be able to apply automated changes to the infrastructure, the configuration needed to be data (e.g. JSON, YAML) rather than code (e.g. TypeScript, HCL). Declarative infrastructure enables:
- Automated updates across all services
- Consistency in provisioning and configuration
- Easier automation of infrastructure changes
Version control and dependency management
Managing dependencies across thousands of services requires:
- Centralized version control and dependency management to track updates. An example of this is Spotify's BOM, which helps manage the versions of Spotify's most important dependencies.
- Automated dependency upgrades to ensure that security patches and framework updates are applied seamlessly.
- Golden Tech adoption, where standardized libraries and frameworks reduce fragmentation.
CI/CD at scale
With hundreds to thousands of migrations per year, Spotify reimagined its CI/CD platform to support:
- Massive build and deployment volumes
- Automated testing pipelines to validate changes
- Continuous integration, ensuring services remain in a golden state
Cultural shift: Trusting automation
One of the biggest challenges wasn’t building the software — it was changing the culture. Engineers had to trust automated updates happening without their direct involvement.
To make this work, Spotify emphasized strong engineering practices. Test coverage and automation are non-negotiable. Engineers must write tests to catch future issues, an internal principle known as the Beyoncé Rule: "If you liked it, then you should have put a test on it.”
Additionally, the infrastructure platform team had to shift from passive to active product ownership. Instead of merely introducing new platform capabilities, it now owns adoption and uses fleet management to roll out changes that delight the engineers who use their platform.
The results: Automation at scale
Fleet Shift has authored over 1.8 million contributions, with a 3:1 ratio of bot contributions to human contributions, and climbing.
Shifts range from simple version bumps to complex multi-repo transformations. For example, when Spotify discovered a critical security vulnerability in Log4j, Fleet Shift updated 80% of the fleet in less than 11 hours — a task that would have taken weeks or months manually.
The future of fleet management at Spotify
Spotify continues to refine fleet management, focusing on:
- Cleaning up production environments by retiring experimental and deprecated software to reduce cloud costs and security risks
- Exploring monorepos and moving from a polyrepo world to a monorepo structure for better dependency management
- Leveraging LLMs and AI and researching how AI can further reduce engineering toil
Lessons for technology leaders
Spotify’s journey offers useful insights for organizations managing software at scale, including:
- Standardization accelerates automation. Define a Golden Tech framework.
- Declarative infrastructure simplifies updates. Use data-based configurations.
- CI/CD must scale with automation. Optimize pipelines for frequent deployments.
- Test automation is non-negotiable. Ensure reliability across all changes.
- Cultural change is just as important as technical change. Be disciplined about product thinking to earn the engineers’ trust in automation.
- Invest in developer experience. Tools that centralize resources and streamline workflows make it easier for engineers to adopt new standards and practices. A great developer experience accelerates adoption and reduces resistance to change.
Spotify’s fleet management journey is a testament to the power of standardization, automation, and cultural change in managing software at scale. By reducing engineering toil and empowering developers, you can build a foundation for long-term innovation.
If there’s one takeaway from Spotify’s story, it’s this: Automation isn’t just a tool — it’s a mindset. Embracing it can transform how your organization builds, maintains, and innovates at scale.
To learn more, check out the first article in this series, Fleet First for Better Developer Experience and Faster Software Delivery.