By Chapman Lever
Businesses are increasingly relying on APIs to power their applications. So, whether you rely on internal or third party APIs, it is likewise becoming more important than ever that they ensure these APIs are available, functional, and performing as expected.
API monitoring gives continuous visibility into how both internally-created and third party APIs are performing. As such, Splunk Synthetic Monitoring’s API Checks can be leveraged to monitor performance related to Service License Agreements (SLAs). In this blog we will dive into how this is done in practice.
What is an SLA?
First, a bit of context. As you might know, a Service License Agreement (commonly called an SLA) is an agreement between two parties about what services will be provided from one party to another, and could include any number of services–from custom support replies times to product delivery.
Oftentimes, SLAs between two technology or software providers outline availability and responsiveness.
Availability is usually specified as the uptime percentage. But there are other factors to consider, too. For instance, an API could return a 200-level response, but not be operating in a way that it should. As such, we argue that this isn’t a sufficient measurement of availability.
Responsiveness outlines how quickly an API responds. As with availability, it’s important to ensure that the API responsiveness components outlined in SLAs are appropriate to your use case.
If the an SLA is not met and upheld, one party might be entitled to a credit, refund, or freedom to back out of a contract.
For example, a prepaid Parking-reservation Web App relies heavily on data from a third-party that specializes in mapping. When this Parking-reservation Web App agrees to work exclusively with one Mapping Provider, that Mapping Provider may guarantee, “Your app will have access to our map data 99.99% of the time and we will notify your team at least three weeks in advance of upcoming planned maintenance.”
The team managing the Parking-reservation Web App might say, “Nice! That sounds like a great deal. Our users tolerate some glitchiness and they never complain about it on Twitter, so 99.99% uptime is more than enough. Three weeks is plenty of time to let our customers know about upcoming downtime. We agree to the terms, but if your API is available less than 99.99% of the time we’ll need to be refunded in full.”
Everyone signs on the dotted lines and shakes hands and the Parking-reservation Web App builds a new feature that hooks to pull data from the Mapping Provider’s API.
How to Ensure That You’re Upholding Your SLAs
In the above example, the Mapping Provider might decide that that they want to gather data about their APIs performance and stay ahead of issues to ensure that they are upholding the agreement.
And, it would be nice if we could share that data publicly with our partners at the Parking-reservation Web App so they know they can trust our service.”
They have a few options available to meet that goal:
- They could rely on the internal monitoring of the application, but that might only give part of the picture. For instance, it wouldn’t provide insight into if the map data is available from the API to the end user outside of the system.
- They could build a synthetic, external monitor to test pulling data from their API and put alerting in place so that their engineers know right away if there’s any type of issue that might put them close to breaching the SLA.
By leveraging proactive data that simulate user interactions with the mapping system, the second solution provides a few benefits such as getting ahead of performance issues before they impact real users or the ability to create reports to share with partners.
The example above shows an alert from an API Check in Splunk Synthetic Monitoring.
One neat feature that Splunk Synthetic Monitoring has available for Real Browser Checks, Uptime Checks, and API Checks is the ability to receive a secondary line item for SLA % in each daily, weekly, or monthly performance report. The SLA % number is based on the failures after notifications have been sent to your team, instead of all failures recorded. Your SLA % might represent your reportable uptime, whereas the standard Uptime % would include temporary connection problems or outliers that aren’t relevant for reporting to anyone outside of your organization.
This emailed report from Splunk Synthetic Monitoring includes a line item for SLA % that shows the Uptime reportable to a third-party based on set alerting criteria.
How to Enforce SLAs with Your Partners
On the other hand, the business owners of the Parking-reservation Web App may want to do their own due diligence and compare some external data to the reports generated by the Mapping Provider.
To do so, the Parking-reservation Web App can use external API Checks to monitor the performance of the mapping API and confirm that the mapping API is in fact up 99.99% of the time.
If the availability falls under the SLA or if there is prolonged downtime that wasn’t communicated in accordance with the SLA, then the Parking-reservation Web App can use their own reports to start a conversation about rectifying the breach of the agreement. Furthermore, API Checks can also help them better understand how issues with the partner’s API might affect their actual users (because, let’s not forget, users oftentimes don’t know that a third party might be responsible for downtime).
As a reminder, API Checks can be used to monitor both availability and responsiveness. So, API Check use cases should match the terms of SLAs. For instance, if the SLA is based only on availability, the check configuration should reflect that. This can be done by configuring a check to simply hit the end point and then rely on the Uptime %. If the agreement is based on how quickly an API returns data, then we can build a multi-step check that pulls data from the API and then compare the Average Response Time to our agreed standards.
In the above example, we can see a multi-step API Check on the Mapping Provider’s API that confirms that data falls in an expected range and tracks the availability of these requests.
Gaps in API performance don’t only impact user experience, but also disrupt workflows, hurt brand reputation, and can lead to significant revenue loss. SLAs seek to provide accountability and incentivize ensuring APIs are performing properly.
API Checks can be used by partners on both sides of service license agreements to confirm that the agreement is being met according to the terms. For API owners, proactive monitoring can help you catch issues before they impact your partners and empower you with reporting that’s easy to share with your partners. For API end-users, proactive monitoring can alert you of third-party issues affecting your users and also help offer an extra level of confidence that your partners are upholding their agreement.