In this post, John Rousseau, Operations Team Lead at Onshape talks to us about how they’re using SignalFx to become data driven across the company.
Onshape is the first and only full-cloud 3D CAD system that lets everyone on a design team work together using any web browser, phone, or tablet. Onshape was built from scratch for the way today’s engineers, designers and manufacturers really work, giving them secure and simultaneous access to a single master version of their CAD data without the hassles of software licenses or copying files.
Can you tell us about your team?
The operations team here does a bit more than a traditional operations team, more like a platforms or SRE organization. We handle larger scale issues like server side scalability development, database scaling and perforamnce, release engineering, security services, internal tools development, and platform work on AWS.
Can you tell us a little bit about the nuts and bolts of your application?
To create our disruptive CAD-as-a-Service product and make it scale with demands of a global user base creating designs on every kind of device, Onshape has been built 100% on AWS. Spread across three regions, the app runs on hundreds of instances. The infrastructure is divided between Dev/Test, Staging, and Production clusters with roles representing the microservices that make up all of Onshape. Teams are organized around their services and supported by an operations organization focused on broader issues like release engineering, security services, scaling infrastructure, building platforms, and providing tooling.
What kind of challenges do you face with monitoring?
The main challenges we’ve had have been:
- Providing self-service access to metric creation, visualization, and analytics so both engineering and non-engineering teams can track and analyze their own KPIs and gett visibility throughout the software development lifecycle — via a consumable interface that doesn’t require learning new query languages or DSLs
- Handling dynamic data so that charts, dashboards, or alerts don’t need to be modified or refreshed every time a service scales or customers are added or a different time range needs to be looked at
- Having real-time interactive analysis of our data as it’s streaming to pivot on metadata like customer type or service interactively, so teams can explore and notify against behavioral changes at every level of the app right when they occur, instead of minutes or hours later
- Limited engineering resources need to be spent on building Onshape instead of cobbling together, customizing open source components into, scaling, and maintaining a metrics platform
Why did you choose SignalFx?
We originally chose a different metrics monitoring product, before SignalFx had even launched, but found it to be too expensive to roll out to all parts of the product development pipeline, unable to deal with dynamically changing data, impossible to do interactive analytics in, and to slow to catch problems before they impacted our customers.
We love that SignalFx could provide an easy to use, self-service metrics capability, with instrumentation, visualization, and analytics that can be used by not just operations–but also the developer, product, and UX teams–without making anyone learn a new query language or DSL.
SignalFx was the only solution that we could reasonably roll out to the entire Onshape infrastructure. Host based pricing models force us to make a decision as to whether every new host is worth paying an additional fee for, but SignalFx’s usage-based pricing gives us the flexibility to monitor metrics from every system we care about.