Observability

April 01, 2022

5 Minute Read

A Primer for Monitor as Code: How to use Splunk Observability Cloud with Terraform

By Splunk

Managing the complexities of today’s cloud native infrastructure has resulted in the increased need for observability. As cloud adoption continues to grow, the need to deliver a better customer experience, scale efficiently and increase momentum on innovation has never been more important. For many organizations to carry out these principles, two technologies are helping organizations deliver on these goals faster: Monitoring-as-Code and Infrastructure-as-Code.

This blog post will cover how Monitoring-as-Code and Infrastructure-as-Code work hand in hand and how these technologies bring efficiency across your CI/CD development pipeline. I’ll also walk you through a few simple steps for setting up Splunk’s Terraform provider to easily make monitoring and observability part of your application code.

What is Infrastructure as Code?

Traditionally, deploying IT infrastructure was daunting and involved multiple teams and manual provisioning steps. Any error made during this lengthy, complex process could cause the application deployment to fail or present issues with performance, possibly affecting your customer experience. In response to the many problems with manual provisioning, Infrastructure-as-Code was born, and large-scale systems became declared in configuration files as code. An infrastructure-as-code system allows users to specify what the final infrastructure setup should look like and then trust the tool to handle the work in the backend to make the infrastructure look like the desired state. The golden example of an infrastructure-as-code platform is Terraform, from HashiCorp. When changes are required, simple modifications within your Terraform configuration are quickly reflected in the current running infrastructure.

Where does Monitoring as Code Fit?

Setting up monitoring for your infrastructure and applications can create many of the same challenges faced when manually provisioning your infrastructure. These challenges are typically faced after the initial implementation of your infrastructure and become noticeable as complexity increases in your monitoring deployment. When you use Monitoring-as-Code, your monitoring configuration is closer to your application and development workflows. In fact, it’s literally in those same workflows; checked in to your version control system and changed as part of code deployment by your CI/CD system. No matter where your infrastructure lives, whether the cloud or on-premises, the monitoring assets you need to properly observe your application are never left behind, no matter where your infrastructure lives.

It is important to remember the difference between Monitoring-as-Code and using monitoring or observability tools independently. While observability tools like Splunk Observability Cloud provide full-fidelity monitoring and troubleshooting across infrastructure, application and users in real-time, it can’t automatically determine business logic or what specific detectors are most vital to your business metrics. Monitoring-as-Code manages the entire approach to how data is collected from your application and then used to help you solve problems. Let’s take a look at an example.

How to implement Monitoring-as-Code

HashiCorp Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can be extended with providers, including the Splunk-Terraform provider. This provider interacts with resources supported by Splunk Observability Cloud and builds a configuration that we can include as part of our infrastructure configuration’s HCL, or as a separate terraform deployment using an API token for authentication.

In this example, we have deployed a microservices-based application using Kubernetes. The application is built using four different services. We will use Terraform to deploy a detector that will alert in a critical state if one of the four microservices is down.

To begin, we will need an API authentication token to authenticate with Splunk Observability Cloud. Navigate to the account settings and click Access Tokens.

Click on New Token, provide a name for your access token and select API Token as the permission.

With your API token created, you can now click on the Show Token link to view and use the token as part of your code when using Terraform.

With our API token in place and terraform installed, we can use Terraform CLI to deploy the following code. (Note: SignalFx is the former name for certain components of Splunk Observability Cloud.)

terraform {

 required_providers {

   signalfx = {

     source = "splunk-terraform/signalfx"

     version = "6.8.0"

provider "signalfx" {

  # It is strongly recommended to use a secret management Terraform Provider such as Vault, but for this example we include the token here.

  auth_token="API TOKEN HAS BEEN OMITTED"

  api_url = "https://api.us1.signalfx.com" #use your custom Splunk Observability realm URL

resource "signalfx_detector" "movieappspods_notready" {

name        = "One or more Movie microservice pods are not ready"

description = "This alert will trigger in the event a microservice pod for the movie applications is in a non-ready state."

program_text = <<-EOF

      A = data('k8s.container.ready', filter=filter('metric_source', 'kubernetes') and filter('app', 'movies', 'actors', 'dashboard', 'directors'), rollup='count').count().publish(label='A')

      detect(when(A < threshold(${var.pod_amount}))).publish('Movies Application Microservices Pods')

EOF

rule {

  description   = "One or more movie application microservices pods are not ready"

  severity      = "Critical"

  detect_label  = "Movies Application Microservices Pods"

  notifications = ["Email,you@example.com"]

When inspecting the code, you can see that we provide the necessary fields required for the provider to authenticate with Splunk Observability Cloud (auth_token, api_url) and the required resource (signalfx_detector) to create the detector. Detectors are declared by placing SignalFlow in the program_text field. For the proper SignalFlow syntax, use the Developer Guide for Splunk Observability Cloud, or within the Splunk Observability Cloud GUI, select “Show SignalFlow” from the ellipsis menu of any existing detector or chart, as shown below, SignalFlow also supports manipulations, aggregation, or other operations run on the data in realtime, so your monitoring-as-code deployment will provide you the same robust realtime data and analytics that a GUI-setup detector would.

Next, complete the Terraform deployment, navigate to the detectors within Splunk Observability Cloud, and your detector is ready to alert you in the event of a microservice outage.

Conclusion

While the UI makes it simple for anyone to create Observability resources like charts and detectors without learning a new language, we recognize that advanced organizations want to be able to store their monitoring setup along with their application code. Storing all the required data in one source-of-truth (your source repo) makes it effortless to make sure that essential monitoring is always deployed with your application, and that it is always up-to-date.

Additionally, Terraform Cloud agents emit OpenTelemetry format data (metrics and traces,) so you can even analyze their performance and troubleshoot issues in Splunk Observability Cloud. You can also, of course, use Terraform to deploy the OpenTelemetry collector as well, all controlled through one centralized location.

Get Started with Monitoring as Code with Splunk

You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Learn more about Terraform and the Splunk Terraform Provider and get started with Monitoring-as-Code today!

----------------------------------------------------
Thanks!
Johnathan Campos

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram