Introducing the Security Datasets Project

When I first started using Splunk back in the version 4.2 days, I had a hard time immediately understanding "what" I would do with it and security. Sure it was neat to have "Google for my data," but I did not immediately get why Splunk had all my peers in a frenzy.

Finally, I worked through some issues and got Bro IDS data installed, and "boom," I got it. It was awesome. HOW HAD I EVER LIVED WITHOUT SPLUNK?! Sadly, I think this problem still exists... With that in mind, I've created something I call the Splunk Security Dataset Project.

So what is this "Project" you ask? Well, to be frank, it's a little bit of everything. If you're brand new to Splunk, we're giving you access to data that is relevant to what you do for a living; moreover, we walk you through how to use it. If you're a seasoned Splunk veteran and just want a bit of a sandbox to polish your hunting or learn a new skill from one of our tutorials (or try one of the techniques from our Hunting with Splunk blogs), here's a great place to try them out. Finally, when the next "EternalBlue" or "Petya" or even NOT "Petya" occurs, don't have a "Meltdown"—we're going to be working with our Security Research Team to begin hosting the telemetry of those exploits in the Splunk Security Dataset Project.

So What Does It Look Like?

First, register here; you're then forwarded to the "Fun with Datasets" app that Splunk hosts in the cloud. It will look something like the image below. Each dataset has a quick description and then some metadata about the types of logs and scenarios that it contains.

For example, the MACCDC dataset comprises of Bro and Snort logs from the 2012 MACCDC that Matt Sconzo at Security Repo was kind enough to host. We slightly modified them (added headers) and put them into an index. However, that wasn't good enough, so we added a guided tutorial for each dataset where we teach you how to find some awesome stuff in the logs using the power of Splunk. :-)

The long-term goal of this project is to add a new dataset every three months. We're looking at everything from "intrusion to explosion" with ICS/SCADA, to atomic red teaming toolsets from Red Canary. Every dataset will have a blog post, an educational tutorial, and lots of associated material where we go over the "strategic" aspects of how to collect and use the sort of data that is in the dataset.


I am very excited by this project and hope that you get as much use out of it as I think you will! If you have any suggestions for new datasets, feel free to send us an email at bots [@]

Ryan Kovar
Posted by

Ryan Kovar

NY. AZ. Navy. SOCA. KBMG. DARPA. Splunk.