DEVOPS

Enabling the Self Driving Cloud with Splunk Observability Cloud and GKE Autopilot

In 2021, any time that you access any kind of web service, whether it be via a website or app, chances are high that the backend is running on Kubernetes. Hundreds of thousands of organizations rely on Kubernetes to power and manage their mission critical services every day, and the reliability and scalability benefits offered by Kubernetes have been felt across the industry.

However, there are still improvements that can be made, and while high-scale services are massively easier to manage with Kubernetes than well-adopted prior alternatives, the experience is far from perfect. Managing a large service deployment requires the developers and ops personnel who work on it to have excellent observability into the system. Even with Kubernetes’ autoscaling capabilities, managing the scale of a cluster can prove to be a difficult balance between cost efficiency and potential resource exhaustion. Security is also a major concern.

Splunk Observability Cloud, Splunk Cloud Platform, and Google Kubernetes Engine (GKE) Autopilot functionality fill in the gaps that many Kubernetes users experience today. Splunk Observability Cloud provides developers and operators with deep visibility into the composition, state, and ongoing issues within a cluster, and GKE Autopilot provides built-in security hardening and automatically manages a cluster’s resources to maximum efficiency.

Example

Imagine the case of a large e-commerce company running a set of 50 different services in a Kubernetes cluster. In a standard Kubernetes deployment, Kubernetes would handle the creation and scale-out of workloads within the cluster’s predefined size limits, the restarting of containers that are failing basic health checks, and the internal network and external endpoints for each workload. This e-commerce company still has a lot of management ahead of them:

  • Kubernetes (or their cloud provider) provides some visibility into the CPU and memory consumption of each pod, container, etc., but not enough for DevOps personnel to gain a true understanding of the services deployed to the cluster, how they interact, their dependencies on each other, and if they’re failing to meet customer performance expectations.
  • Securing a Kubernetes cluster using practices like shielded nodes or workload identity authentication can be cumbersome, time consuming, and is not always possible on managed Kubernetes services.
  • Someone will have to regularly fine-tune the cluster’s scalability profile in response to its changing performance characteristics as new features are developed and new services are deployed. The cluster’s ability to rapidly scale up must be balanced against spending too much on unnecessary compute resources.
  • If an attack occurs, Kubernetes has no built-in functionality or analysis systems that detect this and let users know.
     

By using Splunk Observability and Splunk Cloud Platform and GKE Autopilot, this same e-commerce company can:

  •  Deploy software more quickly and achieve higher-quality releases because their developers can observe how each service performs and interacts with other services, how individual requests made within client applications or websites are processed all the way down to the database level, and the relationships between service and infrastructure performance in Splunk Observability Cloud.


  • When issues do occur, these same users can quickly track down and fix the root cause thanks to Splunk Observability Cloud’s infrastructure and application analysis capabilities.
  • Be confident that attackers can’t gain access to their cluster through unsafe Kubernetes features or by impersonating a workload.
  • Not have to worry about perfecting their scaling logic: GKE Autopilot does this without any additional effort and customers only pay for what they actually use.
     

Getting Started

Splunk’s (and OpenTelemetry’s) support for GKE Autopilot is now available via OpenTelemetry Collector version 0.41 and above. Users can create a GKE Autopilot cluster and have it send data to existing Splunk Observability Cloud or Splunk Cloud Platform environments using the Splunk OpenTelemetry collector in seconds. Both destinations are options available during agent installation depending on your specific use case. Please refer to our documentation for more information on installation.

If you’re interested in trying out Splunk Observability Cloud on the Google Cloud Platform or anywhere else, you can get started today. Splunk Observability Cloud’s support for GKE autopilot is available, here

Continually inspect for and respond to attempted security breaches or other notable activity, thanks to Splunk Cloud Platform's deep analysis of logs and other machine data.

Morgan McLean
Posted by

Morgan McLean

Morgan McLean is a director of product management at Splunk focused on the Splunk Observability Cloud, along with Splunk’s contributions to OpenTelemetry and the agent and ingestion unification between Splunk Observability Cloud and Splunk Enterprise. Additionally, he is the co-founder of OpenCensus and OpenTelemetry, now the second largest CNCF project behind only Kubernetes. Prior to Splunk, Morgan spent five years as a product manager at Google Cloud Platform working on DevOps and observability initiatives, along with over three years at Microsoft as a program manager designing and implementing e-commerce services. Morgan has a BASc in Engineering Physics and a BA in Economics from the University of British Columbia.