How to Optimize Your Cloud Spend Using Observability

The rise of public cloud services has enabled businesses to innovate faster, scale effortlessly, and adopt more advanced technologies easier than ever before. However, there’s a dark side to using public cloud services: complexity and cost. Public cloud services can scale to handle almost any workload, but in doing so, they can quickly generate unpredictable costs for your business.

Do you understand your cloud infrastructure spend? How do you control and make sense of why you’re spending what you’re spending? Tools from the cloud providers generally don’t offer any insight into why the bill is what it is — and the wide array of services required to operate an app in the cloud can easily result in unwelcome surprises at the end of the month when the invoice arrives.

We’ve discussed at length the fact that complexity is a fact of life in modern operations and how observability helps cut through this complexity. Observability can also help you get insight into and optimize your cloud and IT spending. Let’s look at some examples of how:


Want to skip the examples and see for yourself? Start a free trial of Splunk Observability Cloud instantly, no credit card required.

Easily View Cloud Spend

Splunk Infrastructure Monitoring can provide you with an out-of-the-box, easy-to-read dashboard that shows you how much your application costs to operate. The dashboard below shows example usage data:

In addition to the raw numbers on how many instances, how much they’ve cost over the past day, and how much money is wasted on unused reservations, you can also see cost per instance type, availability zone, and trends for instance count, cost, and reservations. 

Find the Most Expensive Components of Your App and Get Insights on Saving Money

Further, this data can be sliced and diced by application, Kubernetes pod, region, instance size, or practically any other dimension. You can determine how expensive each part of your app is to run, and determine how well you’re utilizing the compute and the other resources you’re paying for. Using correlated data from Splunk Application Performance Monitoring, filtered by (for example) instance type, you can determine if you really need to run a particular app on an r5d.8xlarge and even see what trends are happening over time. You can also get instant visibility into the service logs in the same tool to get additional insight into what’s driving increased resource usage.

Get Suggestions on Cost Savings, and Alerts When Costs Are Too High

In addition to simply seeing what the costs are, Splunk Infrastructure Monitoring also has a built-in cost optimizer tool that gives you actionable insights into utilization and cost-saving opportunities. This tool is accessible with just a few clicks, and is fully integrated with the rest of Splunk Observability Cloud – meaning that you can easily set alerts when billing goes outside of thresholds you’ve set. You can also see the trend in your spending and tie it to deployments or other business metrics. 

Determine the Efficiency of Your Deployment and Save More with Reservations

You can even determine whether your apps are making efficient use of your instances. For example, check out this dashboard:

This tool shows you how many on-demand instances are running in your environment. In most cloud environments, on-demand instances are the most expensive, and you can save significant amounts of money by using reserved instances instead. In this example, $32,000 a day could be saved by switching to reserved instances based on the current number of on-demand instances. You can also see that only 14% of this sample infrastructure is using reserved instances. There’s a high opportunity for savings, but I was able to determine what the opportunity was with just a couple of clicks. Splunk Infrastructure Monitoring will also tell you if you have the opposite problem — too many reserved instances for your workload — and will calculate how much money is lost on unused reserved instances. Finally, you’ll also be able to see when your reserved instances are expiring, the amortized monthly cost of reserved instances, and more. This is all enabled with one simple integration for AWS.

Customer Evidence Shows the Savings Are Real

In this sample infrastructure, the customer could save over $11 million a year by leveraging reserved instances. Great story, but maybe a bit unrealistic. Let’s look at a real story from one of Splunk’s customers, Acquia. Acquia is a digital experience platform that their customers rely on, and as they grew, so did their cloud compute cost. After integrating Splunk Infrastructure Monitoring*, the cost optimizer tool discussed above saved Acquia over $600,000 per year on AWS costs. They also reported a 26% reduction in average time spent per incident and over $1 million per year in annual productivity gains.

Getting data about how much your application costs should be considered table stakes from observability. Getting insight into how you can reduce that spending and get the most value out of your spend is what unlocks business value. Cloud spending can easily spiral out of control if you don’t monitor it closely, so you need an observability platform that not only knows how much you’re spending and on what, but can offer you suggestions on how to fix it. You need Splunk Observability Cloud — start an instant free trial now.

* Splunk Infrastructure and Application Performance Monitoring products were previously known as SignalFx.

Greg Leffler
Posted by

Greg Leffler

Greg heads the Observability Practitioner team at Splunk, and is on a mission to spread the good word of Observability to the world. Greg's career has taken him from the NOC to SRE to SRE management, with side stops in security and editorial functions. In addition to Observability, Greg's professional interests include hiring, training, SRE culture, and operating effective remote teams. Greg holds a Master's Degree in Industrial/Organizational Psychology from Old Dominion University.