In this recent post we mentioned that API Checks could be used to monitor performance related to SLAs. In this post we’ll expand on that example and show what it looks like to monitor APIs for SLAs in practice.
What is an SLA?
A Service License Agreement (commonly called an SLA) is an agreement between two parties about what services will be provided from one party to another. In a broad sense this agreement could include any number of services – everything ranging from custom support replies times to product delivery.
Often when SLAs are established between two technology or software providers the agreement will outline both:
- Availability: What uptime percentage can be guaranteed by the partner? How much time in advance is required to notify a partner of planned downtime or maintenance?
- Responsiveness: How quickly can my system expect reply times from a partner’s system?
And one party might be entitled to a credit, refund, or freedom to back out of a contract depending on whether those SLAs are met and upheld.
For example, let’s imagine there’s a Ride-sharing Web App that relies heavily on data from a third-party that specializes in mapping. When this Ride-sharing Web App agrees to work exclusively with one excellent Mapping Provider, that Mapping Provider may guarantee, “Your ride-sharing app will have access to our map data 99.99% of the time and we will notify your team at least three weeks in advance of upcoming planned maintenance.”
The team managing the Ride-sharing Web App might say, “Nice! That sounds like a great deal. Our users tolerate some glitchiness and they never complain about it on Twitter, so 99.99% uptime is more than enough. Three weeks is plenty of time to let our customers know about upcoming downtime. We agree to the terms, but if your API is available less than 99.99% of the time we’ll need to be refunded in full.”
Everyone signs on the dotted lines and shakes hands and the Ride-sharing Web App builds a new feature that hooks to pull data from the Mapping Provider’s API.
How to Ensure That You’re Upholding Your SLAs
As business owners for the Mapping Provider we might say, “Hey – we need some data to get ahead of issues so that we can make sure that we’re upholding our end of the bargain. And, it would be nice if we could share that data publicly with our partners at the Ride-sharing Web App so they know they can trust our service.”
We could rely on the internal monitoring of our application, but that might only give us part of the picture. How do we know whether our map data is available from our API to the end user outside of our system? How do we confirm that data isn’t just available but in the right format?
We can build a synthetic, external monitor to test pulling data from our own API and put alerting in place so that our engineers know right away if there’s any type of issue that might be putting us close to breach of our agreement.
The example above shows an alert from an API Check in Splunk Synthetic Monitoring Monitoring.
With the proactive data that simulates real end users interacting with our mapping system we can 1) get ahead of performance issues before they affect our real users, and 2) share reports with our partner to demonstrate that our uptime is exactly what we promised.
The above emailed report from Splunk Synthetic Monitoring Monitoring includes a line item for SLA % that shows the Uptime reportable to a third-party based on set alerting criteria.
One neat feature that Splunk Synthetic Monitoring has available for Real Browser Checks, Uptime Checks, and API Checks is the ability to receive a secondary line item for SLA % in each daily, weekly, or monthly performance report. The SLA % number is based on the failures after notifications have been sent to your team, instead of all failures recorded. Your SLA % might represent your reportable uptime, whereas the standard Uptime % would include temporary connection problems or outliers that aren’t relevant for reporting to anyone outside of your organization.
How to Enforce SLAs with Your Partners
As business owners of the Ride-sharing Web App we may say, “Hey – that’s nice that you’re giving us these reports, Mapping Provider friends, but we really need to do our own due diligence and compare some external data to your reports.” We could use external API Checks to monitor the performance of the mapping API and confirm that the mapping API is in fact up 99.99% of the time.
In the event that we see availability fall under the SLA or if we see prolonged downtime that wasn’t communicated three weeks in advance we could use our reports to start a conversation with our partner about rectifying the breach of the agreement. We may also use API Checks to better understand how issues with our partner’s API might affect our real users.
Remember: API Checks can be used to monitor both availability and responsiveness. Any use cases we build for API Checks should match our SLA agreements. If our agreement is based on availability alone, we can configure a check to simply hit the end point and then rely on the Uptime %. If our agreement is based on how quickly an API returns data then we can build a multi-step check that pulls data from the API and then compare the Average Response Time to our agreed standards.
In the above example we can see a multi-step API Check on the Mapping Provider’s API that confirms that data falls in an expected range and tracks the availability of these requests.
API Checks can be used by partners on both sides of service license agreements to confirm that the agreement is being met according to the terms. For API owners, proactive monitoring can help you catch issues before they impact your partners and empower you with reporting that’s easy to share with your partners. For API end-users, proactive monitoring can alert you of third-party issues affecting your users and also help offer an extra level of confidence that your partners are upholding their agreement.