Observability

May 08, 2019

10 Minute Read

An Insider’s Guide to Splunk on Containers and Kubernetes | Splunk

By Splunk

Our Splunk Enterprise and Universal Forwarder container images on DockerHub are pulled millions of times each month. In October, we recognized their popularity by beginning to support single instance deployments of Splunk in containers. Since then, many customers have expressed a desire to learn more about our thoughts and plans for containers, and especially running Splunk on Kubernetes.

Our mutual experience has identified a number of additional details and complexities involved in running Splunk on Kubernetes, including security, persistent storage, high availability and performance. These are top of mind for us as we think about our future direction and will need careful consideration by customers pursuing their own deployments.

Container Security

As a software vendor, containers create some unique challenges. One challenge is the increased responsibility for security. Traditionally, the bits we shipped interfaced with lots of other bits shipped by operating system vendors. There was a clear line delimiting the attack surface we were responsible for protecting (our own code and the libraries we embedded into it), versus what operating system vendors were responsible for (everything else). Now that containers bundle the application and operating system bits together, we're starting to see that line move. For the first time, we have to worry about vulnerabilities in more external projects like glibc and bash.

No one today could question the exceptional importance of security, but honestly we were not ready for this line to move. We started out using Debian for our base container images and had grown accustomed to only focusing on the security of "our" bits. I regularly use a variety of Linux distributions, and love them all for different reasons, but Red Hat clearly excels when it comes to security.

We especially recognized this when we added security scanning to our container CICD pipelines and started experimenting with different base images. Our containers built on top of the latest and greatest Debian images always got flagged with unpatched vulnerabilities, while the containers we built on top of CentOS (a popular Red Hat derivative) were almost always pristine. It probably shouldn't be a big surprise: that subscription fee for Red Hat’s Enterprise Linux (RHEL) buys you (among many other things) a fast turnaround on CVEs. To illustrate this point, here are the results of recent scans we ran on our Splunk Enterprise 7.2.6 images:

Debian 9 (stretch-slim): 12 high, 33 medium, 19 low, 46 negligible, 2 unknown
Debian 10 (buster-slim): 1 high, 8 medium, 4 low, 36 negligible, 1 unknown
Red Hat 8 (ubi-minimal): 0, zip, nada!

The problem we had in the past with Red Hat's container images was that their license restricted redistribution and limited use only on RHEL host machines. While a great many of Splunk's customers are also RHEL customers, many are not. This presented a few unfavorable options:

Forgo Red Hat certification and try to fix operating system vulnerabilities ourselves (basically, this is what we were doing by default)
Publish and support two images, while still fixing operating system vulnerabilities ourselves (this would require more resources than we had

Red Hat’s Universal Base Images

I've been a longtime user and fan of Red Hat (and various RPM cousins) ever since I started running their pre-enterprise releases back in the '90s. Yesterday, at Red Hat Summit in Boston, my fedora-sporting friends solved the container security problem for us with their launch of Universal Base Images (UBIs). These include the latest security patches we’ve grown to expect from Red Hat, released under a more permissive license that allows them to run on any host operating system (although we still recommend using RHEL).

Subsequently, I'm pleased to announce the beta release of new Splunk Enterprise and Universal Forwarder container images built with Red Hat inside. You can download these now from DockerHub by appending "-redhat" to our image tags. For example:

docker pull splunk/splunk:redhat

docker pull splunk/splunk:7.2.6-redhat

docker pull splunk/universalforwarder:redhat

docker pull splunk/universalforwarder:7.2.6-redhat

In the future, we plan to replace our Debian based images, making Red Hat the default. We expect this change will be transparent to our customers. We're also working together with Red Hat to certify these images, and hope to publish them soon in the Red Hat Container Catalog.

Splunk and Kubernetes

I'm thrilled to see more and more of our customers wanting to run Splunk on Kubernetes. We believe that Kubernetes is incredibly powerful and important, and envision a future in which it helps to greatly simplify deployment, monitoring and administration of Splunk.

Currently, we only support single-instance deployments of Splunk on Kubernetes. While some experts have been quite successful running clustered deployments, there are many hurdles that make this challenging. Splunk by nature is very stateful, while Kubernetes was initially built for stateless microservices. Kubernetes is also far more like a cloud development toolkit; it provides immense flexibility, lots of bells and whistles, and miles of rope to hang yourself with.

We previously published a collection of YAML templates that can be used to deploy both single-instance and clustered deployments of Splunk on Kubernetes. We have also been experimenting with a few projects internally that could one day help make Splunk and Kubernetes best of friends. I'll whet your appetite by discussing these below, but please be aware we may never release them to the general public. If you're getting your Red Hat on in Boston this week, please come by the Splunk booth 314 for a demo.

Kubernetes Operators

YAML templates quickly become quite complex for anything other than the most basic of applications. They are great for simple microservices, not so great for complex applications like Splunk. The introduction of Custom Resources opened up a whole new world for Kubernetes, enabling developers to go beyond the built-in object types and create their own. A popular use case for Custom Resources is creating higher-level concepts that combine multiple things together, abstracting out underlying complexity. If you’re a programmer, you could think about it like an object that provides a simpler interface by using many other objects underneath its hood. An operator just takes this one step further: it's a persistent service running inside Kubernetes that listens for any changes made to a set of Custom Resources.

Imagine if any deployment of Splunk could be expressed as a single SplunkEnterprise object. All the things a customer may want to configure could be declared in the spec for that object (or a corresponding ConfigMap). Customers wouldn't have to worry about things like Pods, StatefulSets, PersistentVolumes, etc. because the operator would manage all of these for them.

Operators especially differentiate themselves from other application packaging approaches (such as Service Bundles and Helm charts) for day 2 operations. Operators can be used to seamlessly manage data volumes, handle upgrades, scale up and down, perform common maintenance tasks, etc. Stateful applications like Splunk require many processes to be performed in a specific order. As a software vendor, we can leverage operators to automate all the best management practices for our products using code, rather than requiring staff to manually execute playbooks. I believe this will make it significantly easier for our customers to deploy, monitor and manage Splunk.

The Splunk Operator

If you manage to swing by the Splunk booth at Red Hat Summit this week, make sure to get a demo of an operator POC that we built internally. To deploy a single instance of Splunk, you only need to run a simple kubectl apply command on a template that looks like this:

apiVersion: enterprise.splunk.com/v1alpha1

kind: SplunkEnterprise

metadata:

  name: single

spec:

  config:

    splunkPassword: helloworld456

    splunkStartArgs: --accept-license

  topology:

    standalones: 1

This automatically creates PersistentVolumeClaims (PVCs) and mounts them to /splunk/etc and /splunk/var so that if you server dies and Kubernetes moves the pod elsewhere, all your data moves with it.

$ kubectl apply -f singleinstance.yml

splunkenterprise.enterprise.splunk.com/single created

$ kubectl get pods

NAME                                        READY STATUS RESTARTS AGE

splunk-operator-79cfbd8746-bgv7f            1/1 Running 0 5d1h

splunk-standalone-single-5f865d6646-h2mpz   1/1 Running 0 44s

$ kubectl get pvc

NAME                               STATUS VOLUME                 CAPACITY ACCESS MODES STORAGECLASS AGE

pvc-etc-splunk-standalone-single   Bound pvc-cfdbeac0-6f9d-11e9-92ac-0acd14da47d0   1Gi RWO gp2 53s

pvc-var-splunk-standalone-single   Bound pvc-cfdce5eb-6f9d-11e9-92ac-0acd14da47d0   50Gi RWO gp2 53

Cool, huh? How about a cluster:

apiVersion: enterprise.splunk.com/v1alpha1

kind: SplunkEnterprise

metadata:

  name: example

spec:

  config:

    splunkPassword: helloworld456

    splunkStartArgs: --accept-license

  topology:

    indexers: 3

    searchHeads: 3

You can create a cluster in minutes by just changing the topology parameters. All the manual steps normally required to set up search head clustering, indexer clustering, join everything together, etc. is handled for you by the operator.

$ kubectl apply -f cluster.yml

splunkenterprise.enterprise.splunk.com/example created

$ kubectl get pods

NAME                                             READY STATUS RESTARTS AGE

splunk-cluster-master-example-59666cd544-6dpfc   1/1 Running 0 4m24s

splunk-deployer-example-69bf676c7d-pgxp9         1/1 Running 0 4m24s

splunk-indexer-example-0                         1/1 Running 0 4m24s

splunk-indexer-example-1                         1/1 Running 0 3m46s

splunk-indexer-example-2                         1/1 Running 0 2m13s

splunk-license-master-example-59db6c48ff-2vqrf   1/1 Running 0 4m24s

splunk-operator-79cfbd8746-bgv7f                 1/1 Running 0 5d1h

splunk-search-head-example-0                     1/1 Running 0 4m24s

splunk-search-head-example-1                     1/1 Running 0 3m45s

splunk-search-head-example-2                     1/1 Running 0 2m9s

By default, this uses our splunk/splunk:latest image. You can use our new Red Hat UBI images instead by adding a splunkImage parameter to the spec. What if you set this splunk/splunk:7.2.6-redhat and want to upgrade to 7.3.0 after it is released? All you need to do is change splunkImage to splunk/splunk:7.3.0-redhat and re-run kubectl apply. Upgrade complete, even if your topology spans hundreds, or even thousands of servers.

Beyond Remote Volumes

You may be wondering where those magical PersistentVolumes come from? Normally, you would need to have a Kubernetes cluster configured with a default StorageClass setup that mounts remote volumes over your network (EBS volumes, NFS mounts, etc). This works great if you have it, and if you have enough capacity in your storage tier to handle the demand Splunk asks of it. But Splunk is usually used to manage Big Data, and by Big, we mean very large volume and very high velocity. Our largest customers are managing petabytes of data generated every day. At Big Data scale, your storage tier can quickly become a performance bottleneck.

The latest 1.14 Kubernetes release introduced Persistent Local Volumes as a GA feature. This lets you use local disks attached to your servers to store Splunk data. This is extremely fast, possibly as fast as running Splunk on bare metal. It’s great until one of your servers dies. Persistent Local Volumes are bound to a specific server, so if you lose the server you lose all the data stored on it. The good news is that you can use Splunk's replication feature to protect against data loss, so at least this is no worse than what you get today, without Kubernetes.

Next Generation Storage

A growing number of options are emerging for persistent storage in Kubernetes. Some of these pool the local disks attached to your Kubernetes servers into a virtual storage mesh, exposed as a StorageClass that backs your PersistentVolumes. One of the most promising products we’ve been working with lately is Robin Storage.

PersistentVolumes created using Robin automatically replicate blocks across multiple disks in your cluster. If any server fails, Kubernetes automatically restarts your pods elsewhere, and any volumes they use move along with them, without data loss. Unlike remote volumes and object stores, your data stays inside your cluster, as close as possible to your compute. This can reduce performance bottlenecks by shortening the network path. It also eliminates reliance on external storage services, which can be especially attractive for on-premise deployments of Splunk.

Robin Storage can encrypt and compress your volumes, create zero-copy snapshots and clones, backup and restore the state of entire clusters. It moves your data close to the pods that frequently access it, potentially yielding significant performance benefits. They claim it typically performs within 5-10% of bare metal local disks.

Deploying Robin is easy since they package it as an operator: run a kubectl apply command and you're pretty much done. Of course, you need to have raw disks available for it to use, and there are many additional parameters you can tweak afterwards. Our POC Splunk Operator lets you select the StorageClass for it to use via a storageClassName parameter. We were able to get a Splunk cluster up and running using Robin Storage on Red Hat OpenShift in minutes by just adding that to our YAML examples above.

$ kubectl get pvc

NAME                                    STATUS VOLUME            CAPACITY ACCESS MODES STORAGECLASS AGE

pvc-etc-splunk-cluster-master-example   Bound pvc-51d9dbba-6f9e-11e9-92ac-0acd14da47d0   1Gi RWO robin 4m41s

pvc-etc-splunk-deployer-example         Bound pvc-51e12b99-6f9e-11e9-92ac-0acd14da47d0   1Gi RWO robin 4m41s

pvc-etc-splunk-indexer-example-0        Bound pvc-51eae60e-6f9e-11e9-913c-026cfa37367a   1Gi RWO robin 4m41s

pvc-etc-splunk-indexer-example-1        Bound pvc-69048c78-6f9e-11e9-913c-026cfa37367a   1Gi RWO robin 4m3s

pvc-etc-splunk-indexer-example-2        Bound pvc-a04342fa-6f9e-11e9-913c-026cfa37367a   1Gi RWO robin 2m30s

pvc-etc-splunk-license-master-example   Bound pvc-51d39a4b-6f9e-11e9-92ac-0acd14da47d0   1Gi RWO robin 4m42s

pvc-etc-splunk-search-head-example-0    Bound pvc-51f1f858-6f9e-11e9-913c-026cfa37367a   1Gi RWO robin 4m41s

pvc-etc-splunk-search-head-example-1    Bound pvc-69937886-6f9e-11e9-913c-026cfa37367a   1Gi RWO robin 4m2s

pvc-etc-splunk-search-head-example-2    Bound pvc-a2b86a7a-6f9e-11e9-913c-026cfa37367a   1Gi RWO robin 2m26s

pvc-var-splunk-cluster-master-example   Bound pvc-51db8a61-6f9e-11e9-92ac-0acd14da47d0   50Gi RWO robin 4m41s

pvc-var-splunk-deployer-example         Bound pvc-51e33300-6f9e-11e9-92ac-0acd14da47d0   50Gi RWO robin 4m41s

pvc-var-splunk-indexer-example-0        Bound pvc-51e909ae-6f9e-11e9-913c-026cfa37367a   200Gi RWO robin 4m41s

pvc-var-splunk-indexer-example-1        Bound pvc-6905b8a3-6f9e-11e9-913c-026cfa37367a   200Gi RWO robin 4m3s

pvc-var-splunk-indexer-example-2        Bound pvc-a0445066-6f9e-11e9-913c-026cfa37367a   200Gi RWO robin 2m30s

pvc-var-splunk-license-master-example   Bound pvc-51d47106-6f9e-11e9-92ac-0acd14da47d0   50Gi RWO robin 4m42s

pvc-var-splunk-search-head-example-0    Bound pvc-51f0df75-6f9e-11e9-913c-026cfa37367a   50Gi RWO robin 4m41s

pvc-var-splunk-search-head-example-1    Bound pvc-699475b7-6f9e-11e9-913c-026cfa37367a   50Gi RWO robin 4m2s

pvc-var-splunk-search-head-example-2    Bound pvc-a2b99b41-6f9e-11e9-913c-026cfa37367a   50Gi RWO robin 2m26s

The Road Ahead

We’re focused on delivering enterprise-class experience, partnering with the ecosystem to move the needle on state-of-the-art to bring agility & cost savings in a multi-cloud world. We’re working hard to have our Splunk Operator graduate from a POC into a supportable product and are still experimenting to see what works best. If you're interested in learning and developing along with us, please send me a tweet or message on LinkedIn! We always welcome customer insights and the opportunity to try new things out in real-world environments. One thing that seems certain is Kubernetes is here to stay, and the future holds many exciting new things.

----------------------------------------------------
Thanks!
Mike Dickey

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram