Splunk is a machine data platform with advanced analytic capabilities that allows anyone to get valuable insights from their data. With unlimited use cases, you can leverage SPL to run any analytics you want. SPL has been supporting native machine learning capabilities for some time now. All you have to do is install the Splunk Machine Learning Toolkit (MLTK) and you are good to start predicting !
Splunk customers are already collecting vast amounts of data in Splunk they want to run more advanced analytics. ML is a perfect match and allows you to get incredible insights from your data by extending the powerful SPL with machine learning commands and algorithms.
As you know Splunk use cases are only limited by your imagination and for the very demanding ones machine learning is not enough. It was only natural that Splunk platform will extend its capabilities for the ones who want more power and the Splunk team has been working to make it as easy to be able to use deep learning. During Splunk .conf19, Anthony Tellez and Philipp Drieger announced the Deep Learning Toolkit (DLTK) for Splunk!
In a typical enterprise deployment scenario, a Splunk cluster will run in a distributed environment powered by either bare metal servers or virtualized infrastructure. Deep learning can provide you extreme model performance when run on specialized hardware like GPUs.
So how can you run deep learning in a GPU-less environment? No need to worry because DLTK is designed in the right way to make it easy to extend the processing capabilities of the platform to specialized hardware separate to your standard enterprise environments.
DLTK is designed to give you the ability to connect to an external deep learning docker image! The recipe is really simple!
What we will be describing in this article is how you integrate your Splunk DLTK search head to a GPU enabled dockerized environment running on AWS and run your deep learning analytics.
At this stage we assume you have already installed and configured the Splunk Deep Learning Toolkit to run with your local docker. If you have not already installed DLTK you need to follow these steps. Once you have DLTK up and running you can configure it to connect to an external docker image as we described in this article.
Note: The Splunk Machine Learning Toolkit (MLTK) app is a prerequisite for DLTK. MLTK permissions sharing must be set to all apps (system).
Note: GPU instances are considerably more expensive than the standard ones, so don't keep it running if you don't need it!
Step 1: Deploy Deep Learning AMI
First you need to spin up the required AWS instance. Once you select Launch new instance from your AWS management console , you are taken to the available AMI templates wizard. There you need to search for Deep Learning AMI (Ubuntu 18.04) and select the Ubuntu based AMI — Deep Learning AMI (Ubuntu 18.04) Version 30.x. This image comes preinstalled with the most popular frameworks such as TensorFlow, MXNet, PyTorch, Chainer, and Keras and the latest version of NVIDIA Driver 440.33.01.
Select the “Launch Instance” action from your AWS console , while you are the region you wish to launch the instance. Search for “Deep Learning AMI (Ubuntu 18.04)”.
Once you click the “Select” button you will be taken to the screen where you select the required instance type. In this case we are using the G4 instance from the AWS GPU instances. The Amazon EC2 G4 provides the latest generation NVIDIA T4 GPUs with more than 2K Nvidia CUDA Cores and 320 Turing sensor cores. For our initial setup we will use the most affordable instance type (g4dn.xlarge) which comes with 1 GPU and 4vCPUs.
Once you select the appropriate instance type click next until you go to Step:4 Add Storage. By default this AMI comes with 90GB root disk size. Change this to 150 GB as we will be downloading and using some large docker images and you don't want to run out of space. If you are setting up this for production you can customize/optimize it to your needs.
Click Next until you reach Step 6: Configure Security Group. Here we will create a specific security group for accessing our instance. For the time being just enable the below ports.
- TCP 22 : Remote access via SSH
- TCP 32768: Container SPL application server
- TCP 2375: Remote Docker API
- TCP 8888: Jupyter Labs DEV image
- TCP 6006: Tensorboard DEV image
In our setup because it's only for testing we will allow any IP source to connect. It's best to restrict the AWS security group rules to your DLTK/ML search head IP and your IP only. If you want to restrict to specific IP you will need to add your public IP for port 22, 8888 and 6006. For the rest you need to allow only the IP of the Splunk search head which DLTK is installed on.
Step 2: Verify Your Environment Has GPU Drivers Working
Once your instance is up and running ssh to the machine as we need to carry out some configuration and verify that everything is working.
Note: A dev script was written to automate the setup process of the AMI, which can be found at the end of this post under the "Quick Setup"
ssh -i keys/id_rsa ubuntu@AWS_PUBLIC_IP
Note: Change the AWS_PUBLIC_IP to use your key file and public IP of the AWS instance.
You login to the remote instance via ssh and the user “ubuntu”. Once you login you can see that all relevant frameworks are installed already ! Even better, the NVIDIA Container toolkit is also already installed!
First we would need to verify that the NVIDIA drivers are working properly by using the nvidia-smi command. But for our purposes we are going to jump straight to verifying that the docker container toolkit is properly installed and can see our GPU. One quick way to test this, is by launching the nvidia-smi (Nvidia system management interface) docker container image and asking docker to use the GPU on our machine. If this works then you can just start any docker image you want and pass the same drivers to the image when it runs (which is what we will do for DLTK).
Run the following command and if you see your Tesla T4 GPU in the output then you are good to go!!
docker run --runtime=nvidia --rm nvidia/cuda:10.2-base nvidia-smi
I usually install nvtop which is like ntop but for GPU devices. Just follow the build process below. Sometimes you might get an error similar to “Could not get lock file …” and the apt-get command will fail. As this is a more complex issue than what we think I recommend just wait a few minutes for the auto update process to finish.
sudo killall apt.systemd.daily ;sudo apt-get update ; sleep 5 ;sudo apt-get install -y libncurses5-dev
git clone https://github.com/Syllo/nvtop.git
mkdir -p nvtop/build && cd nvtop/build
sudo make install
If this works you should see your GPU as in the image below.
Step 3: Enable Remote API to Docker
The Deep Learning Toolkit app allows you to utilise docker containers , either locally or remotely, to host your deep learning development or production environment. It also supports K8S cluster, which is more of a production setup, but we don't cover this setup in this post. Docker provides a remote API which is not enabled by default. The docker API is not secure by default and there is a certain process to follow to enable SSL and client certificates.
In the below image you can see our logical setup , where Splunk is interacting with the development image on our docker environment.
For our purposes we will just use the insecure way to connect to the remote API.
Note: This is not recommended for production as anyone can interact with your docker server and also all data is transmitted in clear text.
Run the below commands to enable the docker API first.
cat /lib/systemd/system/docker.service |sed 's/containerd.sock/containerd.sock -H=tcp:\/\/0.0.0.0:2375/g' |sudo tee /lib/systemd/system/docker.service sudo systemctl daemon-reload;sleep 2;sudo service docker restart
Test it with:
If you see the output as above then you are all set!
DOWNLOAD THE GOLDEN IMAGE!
So how do we magically run deep learning jobs from Splunk. In this repo you will find all the relevant docs and the current prebuilt image that we use to develop or run the different algorithms that show up in the content section of DLTK. There is already a pre-built docker image that comes with all the required frameworks, tools, apps, algorithms and SPL integrations required to kick start your development work. This is what you get:
- Natively call any custom algorithm from SPL by using fit and apply commands
- Send staging data from your SPlunk instance (usually for training) ready to be consumed by your model
- Other relevant SPL commands for model evaluation
- You can pass parameters to your model from SPL directly. For example hidden layers, epochs and any other parameter you might support
Finally if you develop your own algorithm, you are able to use that when you execute the relevant SPL command (for example fit). In this case Splunk will send the relevant data set, run the model you developed and finally return the results back to your Splunk instance like any other dataset generating SPL command.
Usually I manually download the image by logging in the AWS instance , so I can check the progress better, since it takes a bit of time for the first run. To download manually from the AWS instance and logged in as ubuntu user run the below command:
docker pull phdrieger/mltk-container-golden-image-gpu:latest
You can do the same from the DLTK UI by going to “Configuration -> Containers” and selecting “Golden Image GPU (3.x)” with GPU Runtime set to nvidia.
For production setup it will be common to build your own image with only the frameworks required and your specific notebooks and algorithms.
All we have to do now is tell DLTK where the remote docker API and you are all set ! The setup is fast and easy and will allow you plenty of time to focus on developing your models for your own data. From there on DLTK can send commands to docker for creating, starting, stopping containers and getting status information. All this information is then presented to the DLTK UI and sometimes written in relevant Splunk configuration files.
Go to “Configuration -> Setup” and fill in the details as shown below. Replace AWS_INSTANCE_GPU_IP with your AWS GPU instance public IP.
- Docker Host: tcp://AWS_INSTANCE_GPU_IP:2375
- Endpoint URL: AWS_INSTANCE_GPU_IP
- External URL: AWS_INSTANCE_GPU_IP
Once you have successfully connected Splunk to the remote docker instance go to Configuration -> Containers and start the golden image with the GPU runtime option set to nvidia (if you haven't done so already).
When the container is up and running you can click on the “Container Model Status” result and you can then see details for the running container. Initially there are no models shown as we have not created any yet.
You can also use the below commands from the cli of the AWS instance (for troubleshooting).
docker ps # Show docker images running
docker logs --follow CONTAINER_ID # Show docker image log output
Step 5: Deep Learning with Splunk!
Before we start having fun (if you want to skip this section go to Step 6) we ‘ll give a more detailed overview on what happens behind the scenes when we run SPL commands. For full documentation on how to use the DLTK to develop your own models you should read the User Guide in the DLTK app.
Let's take as an example the “Neural Network Classifier DGA-Pytorch” use case found in the content library. For this to work you need to have installed and configured the DGA app previously as we need to have the generated features from the original dataset. If you don't have the DGA app installed and want to have a quick ride you can use the “Multiclass Neural Network Classifier-Pytorch”.
Once you open the use case you can see the relevant dashboard which is configured to run the SPL fit command on our dga full dataset. The command looks like the below
This is our dataset containing the relevant features generated in the DGA app.
| inputlookup dga_domains_features.csv
By running our dataset via the “fit MLTKContainer” command we produce a model called pytorch_nn_dga_nn_classifier.pt based on the pytorch_nn.ipynb Jupyter notebook. We can also pass parameters to the model (key/value pairs) such as the epochs, hidden layers etc.
| fit MLTKContainer algo=pytorch_nn epochs=$epochs$ batch_size=$batch_size$ hidden_layers=$hidden_size$ class from PC_1 PC_2 PC_3 ut_consonant_ratio ut_digit_ratio ut_domain_length ut_meaning_ratio ut_shannon ut_vowel_ratio into app:dga_nn_classifier
If we wanted to do some development on the same we can use the parameter mode=stage which will not run the model but just place the requested data on the container image. Once the model is generated it is applied to the dataset transferred to the container image. And finally we get all the results back to Splunk. The model is stored inside the container under /srv/app/model/data.
Step 6: Running Our Model on a GPU
At this stage we can just use native SPL commands to run our datasets against predefined algorithms from the development container image. Go to the DLTK app and click the “Content” menu option. Scroll down to the “Classifier” section and select the use case you want to run. In the following sections we look more closely at the “Neural Network Classifier DGA - Pytorch”. However this requires that you have installed and set up the Splunk DGA App.
If you haven't setup DGA app and working, you can choose the “Normal Network Classifier-Pytorch” which is using the well known Iris dataset available with DLTK.
For our problem in hand we will use a simple Neural network Classifier developed on the Pytorch framework and apply it against our DGA dataset. The same could be used against different datasets but of the same nature.
If you don't have the DGA app installed the IRIS classification is a similar problem and we could apply the same. All parameters in the SPL command allow us to feed the algorithm with the relevant data, define model parameters and let the rest to the DLTK framework!
Option 1: Run the Neural network classifier for IRIS species classification
To run this experiment all you have to do is click on the “Neural Network Classifier- Pytorch” and just try running the model with different epoch values to see the difference.
Option 2: Run the Neural network classifier for DGA domains (Setup DGA App first)
First we run the DGA use case with 100 epochs and wait to see the results. In the results we can also see how the model is performing. Once again native SPL commands allow you to display relevant model parameters and metrics for its performance. Once we run with the 100 epochs we can see that the model performance is fairly poor. However it only took a few seconds to actually run the model. To get better performance we increase the number of passes through the dataset (increase epochs). Remember here that each pass is going through the entire dataset of DGA which is 100K records. You can minimize the time by using a subset of the data utilizing SPL commands like “sample”.
Running with 10K epochs we can see the GPU working to the max and our model performance has significantly improved. The overall process (sending data, running the model and getting the results back) took less than 3 minutes with the model runtime less than a minute! Running this with CPU power would take at minimum and hour!
And this is the end of this blog post! So it's just a matter of minutes to get GPU infrastructure up and running, configure DLTK and operationalize our data with deep learning power.
This setup is for development/testing only and if you require to run this for production you can talk with your local Splunk engineer to sort you out.
You should further protect your docker API by restricting access via the AWS security rules to your search head IP and your own IP only. For production setup contact your local Splunker!
1. Follow Step 1 from above to spin up an AWS deep learning AMI
2. Run the below command on the AWS image cli (login via ssh)
bash <(wget -qO- https://gist.githubusercontent.com/dlamspl/06c539dc3dbce52fcf9f37382ebf7e32/raw/29e03e2dfe564e21609bcca76185e86ce1b0d1ee/dltk_config_ami_clear.sh)
3. Configure your container environment (Replace with the IP of your AWS instance)
Hurray!!! Melt that GPU.