In my years here at Splunk, I’ve lost count of how many customers have asked me why they’re getting the dreaded “Throttling Exception, Rate Exceeded” error from AWS. Historically, there has been no mechanism for a customer to see how many API calls they’re making and which API endpoint they’re hitting. With the recent release of API usage metrics from AWS, that problem is now solved.
In today’s blog, co-authored by superstar Splunker Nic Stone, we’ll first walk you through how to ingest these new metrics via the Splunk Add-on for AWS. Once the metrics have been configured, we’ll demonstrate a simple search and visualize exercise to help you immediately gain insight with your newly ingested metrics. As an added bonus, we’re including a dashboard that you can copy and paste right into your Splunk deployment.
Splunk can show users how often they’re using any of the newly supported API calls. As of this writing, the list of supported metrics is limited to the following 10 operations:
Getting the Data In
Splunk has fantastic support of Amazon CloudWatch, so ingesting these new metrics is a breeze. It’s as simple as setting up a new custom Amazon CloudWatch metric within the Splunk Add-on for AWS.
First, open the Splunk Add-on for AWS and click “Create New Input” from the Inputs tab, and select CloudWatch.
On the next screen, as with any CloudWatch input, you’ll need to configure the input name, which AWS account or role you’ll use for fetching these metrics, and the AWS region(s) that you want to collect the metrics from.
Now it’s time to configure the new data source for your API usage. In the Metrics Configuration section, click on “(Edit in advanced mode)” to get to the custom metric configuration screen.
By default, Splunk will assume that you want to ingest all of the most common AWS metrics. In this example, the goal is to ingest only the new API-related metrics, so all of the defaults need to be deleted. You can do that by simply clicking on the “X” next to each metric namespace. Once that’s finished, there should be a blank canvas to work from. (Note: After deleting all of the Namespace entries, you may end up with dimension values and metrics on the right hand side. If you do, please delete those as well by clicking the “X” on the right hand side of each row.)
To ingest the new metrics, click on the top left where it says “+ Add Namespace” and enter “AWS/Usage” into the text box that appears and press enter. On the right side, click “+ Add Another” which will present you with 3 text boxes. Each of those three boxes should be filled in according to the table below.
Once the Dimension Value box has been populated, you’ll notice that Splunk auto-configures the Dimension field. The next thing to do is click “OK” on the bottom right of the configuration page which will return you to the input configuration. The only steps left to do are ensure the sourcetype is set to “aws:cloudwatch” and select the appropriate index that you’re using for cloudwatch data. On my test system, the final input configuration looks like this:
Searching and Visualizing Your New Metrics
Usually within 10 minutes, you will be able to search for the newly configured CloudWatch usage metrics in Splunk. We'll start with a basic search to see the events and what they look like, then we'll share an example dashboard showing your usage of requests against then default request limits from AWS.
To confirm you have successfully ingested events, run the following search:
Now we’ll create a search to find our maximum utilization as a percentage of our quota for one request type. Let’s use GetMetricStatistics as an example.
First we want to search for metrics from the GetMetricStatistics resource.
Now let’s find the maximum utilization. (Set the time picker to determine the time interval)
source="*:AWS/Usage" metric_dimensions="Class=[None],Resource=[GetMetricStatistics],Service=[CloudWatch],Type=[API]" | stats max(Sum)
Let’s bring in the quota for this resource (found here) so we can see our maximum utilization percentage. Note that quotas are per second, so we’ll multiply the quota by our metric period of 60 seconds.
index="aws-cloudwatch" source="*:AWS/Usage" metric_dimensions="Class=[None],Resource=[GetMetricStatistics],Service=[CloudWatch],Type=[API]" | stats max(Sum) as max max(period) as period | eval quota=400*period | eval percent_utilized=max/quota
This search tells us the maximum percent utilization of our quota for a particular request. This is the starting point for a variety of use cases, for example, we could use this search to configure alarms that warn when utilization reaches a certain threshold.
Notice we set quota to 400*period in the previous search. In the case of GetMetricStatistics, the default is 400 requests per second. If you have requested limit increases from AWS or are unsure of your request limits, you can view them via the AWS console. Alternatively, you can access your service quotas using the AWS CLI tool by following these instructions.
We can also visualize these usage metrics in a customizable dashboard. We’ve included a SimpleXML copy that you can use to get started.
This dashboard uses some macros from the Splunk App for AWS, and therefore has the app as a dependency.
This example dashboard shows you current request usage for each currently supported usage metric compared to the default request limits from AWS.
We’ve provided this dashboard for you to download and install into your Splunk App for AWS.
— Bill Bartlett and Nic Stone