Always in vogue: Availability and 99.5% uptime
As a circular logistics operation, Rent the Runway has had a unique business model since its startup days. “In traditional e-commerce, you send out the goods and they don’t come back,” says Stephanus Meiring, VP of engineering for Rent the Runway. “The majority of what we send out comes back to us, and we then need to get it ready to go out again to another customer.”
To keep operations running seamlessly, Rent the Runway relies on dozens of complex services across its multi-cloud architecture to keep tabs on everything from user journeys on the brand’s website to garments that need repairs or a stain treatment. With instant visualizations on Splunk dashboards, teams have a one-stop shop for critical metrics across the company’s sprawling environment, which enables them to identify and repair problems before impacting customer experience.
“If a customer can’t get a dress or is having issues checking out due to a bug in the experience, we need to know so we can address it quickly,” says staff software engineer Shane Ryan. “We’ve leveled up our monitoring game in the last four to five years with Splunk. We no longer have to wait for our user systems to alert us to issues, as we’ve seen the value of alerting across our front and backend systems. Now we can get ahead of the issues — and when there is an incident, it doesn’t have to be all hands on deck.”
Before, it wasn’t uncommon to need two dozen developers on a call when an incident occurred. Aki Yamada, staff engineer with Rent the Runway from the early days, recalls: “Before we started using Splunk, every resolution was bespoke — logging into production machines to analyze logs and run scripts — but Splunk enables us to answer questions about application history with simple queries.”
Now with full visibility across warehousing and consumer apps, teams can monitor what they need to manage and involve fewer people for incident resolution. The result: increased customer satisfaction — and an improved employee experience. “I don’t remember the last time someone has woken up over Thanksgiving to deal with an outage,” says Meiring. “Holidays used to be tumultuous from a tech perspective due to increased customer demand. Since we’ve upped our usage and adoption of Splunk, we haven’t had a single major outage, and the last critical incident was resolved in less than 15 minutes.”