PLATFORM

Introducing the Splunk App for Data Science and Deep Learning 5.0

Exciting news: The Deep Learning Toolkit App for Splunk (DLTK) will be renamed the Splunk App for Data Science and Deep Learning (DSDL). It’s slightly lengthy, but a better-suited name because the app is useful for both deep learning and data science operations. Not only is it getting a new name and new features - The Splunk App for Data Science and Deep Learning 5.0 is officially a “Splunk Supported” app. This means that customers can resolve issues quickly by filing a case with the confidence they’ll be supported after downloading the app.

A little bit of history

For me personally this is a huge step forward. When I started working on this project in 2018, it was mainly a technical curiosity that led me into working with docker containers to run deep learning models on GPUs and connect them with Splunk. The MLTK Container for TensorFlow was announced at .conf in 2018 and the Deep Learning Toolkit App for Splunk has been freely accessible on Splunkbase since .conf 2019. In 2020, a complete reworked app architecture approach was made available on open sources as DLTK 4.0. A massive thanks to my friend and former colleague Robert Fujara for the invaluable effort he brought in which shaped the future of this app. Now after almost 5 years and more than 7000 Splunkbase downloads, a small app prototype has grown into a widely used app for integrating advanced data science, machine learning and deep learning approaches with Splunk. This was made possible by the contributors and supporters who provided everything, from ideas to source code bits - a big thank you to you all. I’m incredibly grateful for every collaboration, so if you’d like to work together, please reach out!

The Future is Now

While most Splunk users prefer to work within Splunk’s user interfaces like the search bar and dashboards, there are still many data scientists who would rather conduct experimentation and modeling in Jupyter notebooks. DSDL allows both interfaces to come together as a seamless workflow that enables faster and easier operationalisation of data science and research efforts within Splunk.

Splunk’s Search Processing Language (SPL) can be leveraged to prepare specific data science tasks in Jupyter and Python frameworks and libraries of choice, including visualizations with matplotlib, seaborn or others. This offers the best of both worlds: Splunk’s core SPL with freely extendable Python for data science tasks in Jupyter. The basic architecture of DSDL below hasn’t changed much, but we are constantly improving and extending the interfacing parts.


With version 3.9, two new helpful functions were introduced: an interactive Splunk search bar in Jupyter, along with a standard way for logging and directly sending data back from Jupyter or Python code to Splunk via HEC. Have a look at the barebone_template.ipynb Jupyter notebook in your development container to see these functions in action. On the algorithms side, a new anomaly detection example was introduced using the powerful PyOD library. For improved MLOps, an automated instrumentation of all DSDL based container models is available and if you add your container environment metrics you can get a pretty extensive picture of your data science, machine or deep learning operations in Splunk’s Observability Suite. And lastly, additional configuration options for ingress in Kubernetes deployments were added to make in cluster use of Splunk and DSDL containers, e.g. with the Splunk Operator for Kubernetes a more seamless experience.

Now is the Future

With version 5.0 we also took on a few challenges to further secure and streamline operations with DSDL. First and foremost, documentation was created and added to the official Splunk docs pages. I personally want to thank Emma Lauder and her team for the amazing work here - I truly believe she understands how DSDL works better than me. Having documentation also led to some restructuring of content in the app to make its structure cleaner.

In this new version,  all container images got a refresh with the latest updates and fixes. Since we heard quite often from customers that they needed to build their own container images, we introduced a new UI guided one-click image building experience, based on Docker. With this feature, users are now able to create their container images and adapt them to their favorite python data science libraries now with ease. We also received customer feedback to build neural networks with less effort, so in version 5.0 we have introduced the first iteration of a Neural Network Designer. This predefined workflow allows Splunk users to easily define, create, train, apply and score neural networks on their Splunk data. I hope that you’ll find the new features useful, but if you have a suggestion: feel free to submit an idea.

Advances in Cybersecurity Data Science and Deep Learning

Last but not least, we added a new way to apply advanced data science methods in cybersecurity. My colleague Josh Cowling has contributed a Juypter notebook and an example Splunk dashboard in DSDL that demonstrates how host systems can be clustered using UMAP dimensionality reduction on JA3 signatures. This is to better understand behavior and associated anomalies that are interesting pointers for investigations in the context of supply chain attacks as described by Splunk’s SURGe team. Thanks Marcus, Ryan and team. Watch out for Josh’s talk at SANS CyberThreat conference and join him in London!

Finally, I’m so excited to see a new contribution to detect DGA domains using a pretrained model in DSDL in Splunk’s Enterprise Security content updates. This is a new approach on detecting DGAs with the help of a wide and deep neural network. I’m truly excited to see the advances in cybersecurity with state of the art deep learning techniques demonstrated here. A big shout out to Namratha Sreekanta, Kumar Sharad, Glory Avina and the extended security threat research team for your work to push the limits! And did I hear it right, that DSDL got integrated into Attack Range too?! Thanks Patrick Bareiss, Jose Hernandez and team for all of your fruitful collaborations!

As you can see, the app and the community around DSDL is growing. So if you have not explored the app yet, just head over to Splunkbase and download it for free or install it. If you are already familiar with DSDL or former DLTK and you are already using it, feel free to update the app. Please make sure to backup and test things properly, especially if you are running critical operations with DSDL in production use cases.

Happy Splunking and keep innovating with DSDL,

Philipp

Special thanks to Emma Lauder, Namratha Sreekanta, Kumar Sharad, Glory Avina, Patrick Bareiss, Jose Hernandez, Josh Cowling, Marcus LaFerrera, Ryan Kovar, Greg Ainslie-Malik and all other Splunk colleagues for your support and contributions to DSDL. Many thanks to Judith Silverberg-Rajna, Katia Arteaga, Mina Wu and Carleanne O'Donoghue for your support in editing and publishing this blog post.

 

Philipp Drieger
Posted by

Philipp Drieger

Philipp Drieger works as a Principal Machine Learning Architect at Splunk. He accompanies Splunk customers and partners across various industries in their digital journeys, helping to achieve advanced analytics use cases in cybersecurity, IT operations, IoT and business analytics. Before joining Splunk, Philipp worked as freelance software developer and consultant focussing on high performance 3D graphics and visual computing technologies. In research, he has published papers on text mining and semantic network analysis.