Platform

May 07, 2020

3 Minute Read

Deep Learning Toolkit 3.1 - Examples for Prophet, Graphs, GPUs and DASK

By Philipp Drieger

In part 1 of this release blog series we introduced the latest version of the Deep Learning Toolkit 3.1 which enables you to connect to Kubernetes and OpenShift. On top of that a brand new “golden image” is available on docker hub to support even more interesting algorithms from the world of machine learning and deep learning! Over the past few months, our customers’ data scientists have asked for various new algorithms and use cases they wanted to tackle with DLTK. The four new examples below are a subset of those and should also be helpful starting points for others.

DASK for Distributing Machine Learning Workloads

Machine learning workloads can easily consume more compute time, especially when it comes to larger or more complex datasets. It’s no secret that distributing such workloads helps speed up training times or tasks like hyperparameter search. In the Python ecosystem, DASK provides advanced parallelism for analytics, enabling performance at scale. Dask also provides some distributed machine learning algorithms via Dask-ML. The example below shows how a parallel implementation of K-Means can be easily integrated into Splunk using the Deep Learning Toolkit and developed and monitored in Jupyter Lab.

DGA App for Splunk

Device Agnostic PyTorch Example for CPU and GPU

When you connect the Deep Learning Toolkit to a GPU enabled Docker or Kubernetes environment, you can accelerate model training significantly. Benchmarking the dataset from the DGA App for Splunk, we achieved a speedup of over 40x when we ran the example neural network classifier on a GPU compared to the CPU baseline. To put this into perspective: a training job that took over 30 minutes on CPU was cut down to a total of 45 seconds including data transfer overhead on a GPU. That’s pretty useful for much more agile data science iterations and accelerating model creations.

Neural Network Classifier DGA

Luckily, PyTorch easily allows you to write device-agnostic code that runs both on CPU and GPU using the .to(device) magic with minimal impact on your model code. We have added examples that show this functionality for a simple multiclass neural network classifier to get you started quickly.

Forecasting with Prophet

Built by Facebook’s Core Data Science team, Prophet is a library for forecasting time series data based on an additive model where non-linear trends are fit with annual, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend and typically handles outliers well.

Example Forecast with Prophet

Despite the fact that the forecast (green line) on the dashboard above is far from perfect, it can definitely serve as an example to get started quickly with experimentation. However, it also clearly shows that not every time series dataset is perfectly suited for Prophet, so don’t forget to check other robust forecasting methods like the StateSpaceForecast in Splunk’s Machine Learning Toolkit, which can be easily applied with the Smart Forecasting Assistant.

Graph Analysis with NetworkX

You may have read about the latest possibilities for graph analytics in Splunk using the freely available 3D Graph Network Topology Visualization app from splunkbase. My colleague Greg recently published two articles on how those techniques can be used for understanding and baselining network behavior.

If it comes down to quickly developing code or experimenting with graph models, the graph analysis example in Deep Learning Toolkit 3.1 should help you get started quickly and explore more advanced modelling techniques with graphs.

Graph Analysis

Big Thanks to the Community

Recently a DLTK user in Japan built an extension to be able to apply the Ginza NLP library on Japanese Language text and to make the NLP example work for Japanese. Luckily we were able to get his contribution merged into the DLTK 3.1 release. I’m really happy to see this community mindset and I want to thank you, Toru Suzuki-san for your contribution, ありがとうございました!

Last but not least I would like to thank so many colleagues and contributors who have helped me finish this release. A special thanks again to Anthony, Greg, Pierre and especially Robert for his continued support on DLTK and making Kubernetes a reality today!

With the upcoming .conf20 and the recently opened 'Call For Papers' I want to encourage you to submit your amazing machine learning or deep learning use cases by May 20. Let me know in case you have any questions!

Happy Splunking,
Philipp

Building Splunk Mobile for Android

At .conf19, we announced the general availability of Splunk Mobile for Android. Now we're diving deeper into its capabilities and the value it brings for customers!

Platform 5 Min Read

A New Way to Look Like Splunk

This blog kicks off a series where we talk about the entire Splunk UI Toolkit and how each part will benefit your app development process in the future.

Platform 4 Min Read

Google Cloud Platform Serverless Ingestion into Splunk

Collect data from GCP by using Splunk’s Google Cloud Platform Add-On. More about the benefits and the how-to of using a serverless push to Splunk option here.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk