Platform

January 21, 2020

4 Minute Read

Creating a Custom Container for the Deep Learning Toolkit: Splunk + Rapids.ai

By Splunk

The Deep Learning Toolkit (DLTK) was launched at .conf19 with the intention of helping customers leverage additional Deep Learning frameworks as part of their machine learning workflows. The app ships with four separate containers: Tensorflow 2.0 - CPU, Tensorflow 2.0 GPU, Pytorch and SpaCy.

All of the containers provide a base install of Jupyter Lab & Tensorboard to help customers develop and create neural nets or custom algorithms. The DLTK was built with extensibility in mind, as there are many different Deep Learning Frameworks available to data scientists. This blog will cover the open sourced framework and how to build a custom container with pre-built libraries.

https://www.rtinsights.com/top-deep-learning-tools/

DLTK Container Repository

The DLTK container repository can be accessed on github: https://github.com/splunk/splunk-mltk-container-docker and the first step in creating a custom container is to copy this repository to your local development environment.

$ git clone https://github.com/splunk/splunk-mltk-container-docker

Reviewing the README.md provides context on the different flavors available for building a base image. These are sorted into tags like tf-cpu, pytorch and match up with the build.sh located in the root of the splunk-mltk-container-docker directory.

There are some key things to note:

The bases in the build.sh use official images from tensorflow/tensorflow on DockerHub
The Dockerfile uses pip to install new libraries to customize the image

Creating an image uses the following syntax:

$ ./build.sh tf-cpu your_local_docker_repo/

Creating a Custom Image:

In this example guide we are going to create a custom container to install the Nvidia Rapids Framework [rapids.ai]. You can think of these libraries as similar to the libraries that ship with the Machine Learning Toolkit, but capable of running on Nvidia GPUs.

In order to create this rapids container, we have to modify a few files in the repository.

Specifically:

Build.sh to support a new tag
Dockerfile to use conda install instead of pip
Bootstrap.sh to adjust the startup of the container for virtual environments

The change to build.sh is fairly simple, insert the following line after the nlp section:

rapidsai)

base="rapidsai/rapidsai:cuda10.0-base-ubuntu18.04"

;;

Changes to the Dockerfile

Remove the RUN pip install
Replace with conda’s syntax:

# Installing packages

RUN pip install Flask

RUN pip install h5py

RUN pip install pandas

RUN pip install scipy

RUN pip install scikit-learn

RUN pip install jupyterlab

RUN pip install shap

RUN pip install lime

RUN pip install matplotlib

RUN pip install networkx

RUN conda install -n rapids jupyterlab flask h5py tensorboard nb_conda

Note: The rapids container ships with specialized environments, in this case we want jupyterlab, etc to be installed into the rapids python path, -n rapids is required to specify this.

Changes to bootstrap.sh

Fix shell settings in the first line #!/bin/sh
Make rapids environment active

#!/bin/bash

source activate rapids

Building the Custom Image

If the Splunk installation is on the same machine it is not required to set up DockerHub. If the machine you plan to deploy DLTK & the custom image is in the cloud or a different environment it’s recommended to configure docker hub to easily create, manage and deploy docker images.

You can use the following guide to configure DockerHub [https://docs.docker.com/docker-hub/] in your development environment.

You can use a variation of the following command to create your own custom image:

$ ./build.sh rapidsai docker_repo_name/

Example:

$ ./build.sh rapidsai anthonygtellez/

The created image will exist in your local server and can be deployed once you configure the DLTK. Skip to the configuration step if you’re deploying locally.

The following command is used to push the built image to your DockerHub repository:

$ docker push anthonygtellez/mltk-container-rapidsai:latest

The push refers to repository [docker.io/anthonygtellez/mltk-container-rapidsai]

8116c2c543aa: Pushed

aea568d7d2e7: Pushed

a4fd893c10d9: Pushed

b887d71bb9c6: Pushed

301da5ed4c59: Pushed

983a89b54d3e: Pushing [================>                                  ] 1.462GB/4.375GB

4d642a1129a1: Pushed

78db50750faa: Layer already exists

805309d6b0e2: Layer already exists

2db44bce66cd: Layer already exists

digest: sha256:46278436db8c3471f246d2334cc05ad9c8093ab7f98011fc92dc7d807faf4047 size: 2418

The existing layers in rapidsai will not be uploaded, only the changed layers, once the image is uploaded to DockerHub you can pull this image to your environment where DLTK is hosted

be sure to take note of the checksum if you need to rebuild or decide to redeploy the container later.

Configuring the DLTK

The DLTK exists in the app directory in a folder named mltk-container to add custom configurations such as images to the app you need to create a new file named images.conf in the local directory.

Below is an example images.conf configuration for the Rapids container:

[conda]

title = Conda Rapids

image = mltk-container-rapidsai

repo = anthonygtellez/

runtime = none,nvidia

Once the configuration file is in place restart Splunk and open the container management page of the DLTK:

The new image will now be available via the container image dropdown. Be aware the first time you launch the container might take some time, as the docker agent will need to download the image from DockerHub for deployment, subsequent deployments will be much quicker.

Once the container is up and running you can access Jupyter notebook and make use of the newly installed libraries to develop and configure algorithms.

Hopefully with this new framework customers are able to make use of the DLTK to extend their machine learning pipelines.

For more background on the DLTK if you missed the session check out the .conf website:

FN1409 - Advances in Deep Learning with the MLTK Container for TensorFlow 2.0, PyTorch and Jupyter Notebooks

https://conf.splunk.com/watch/conf-online.html?search=%20Tellez%20Deep%20Learning#/

----------------------------------------------------
Thanks!
Anthony Tellez

Creating a Custom Container for the Deep Learning Toolkit: Splunk + Rapids.ai

Creating custom containers for the Deep Learning Toolkit

Platform 6 Min Read

Addition of Syslog in Splunk Edge Processor Supercharges Security Operations with Palo Alto Firewall Log Reduction

Platform 2 Min Read

Dashboard Studio Tips: What's New in 8.2.2106

You asked, we answered. The Dashboard Studio release in Splunk Cloud Platform 8.2.2106 comes with improvements requested by you: UI to add data sources to inputs, hiding the Edit or Open in Search buttons, a brand new markdown visualization, and more!

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

Creating a Custom Container for the Deep Learning Toolkit: Splunk + Rapids.ai

DLTK Container Repository

Creating a Custom Image:

Building the Custom Image

Configuring the DLTK

Related Articles

Creating a Custom Container for the Deep Learning Toolkit: Splunk + Rapids.ai

Addition of Syslog in Splunk Edge Processor Supercharges Security Operations with Palo Alto Firewall Log Reduction

Dashboard Studio Tips: What's New in 8.2.2106

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram