CI/CD Detection Engineering: Dockerizing for Scale, Part 4

Who Are You Again?

Splunk builds innovative tools which enable users, their teams, and their customers to gather millions of data points per second from an ever-growing number of sources. Together, Splunk helps users leverage that data to deliver, monitor, improve, and secure systems, networks, data, products, and customers with industry-leading solutions and expertise.

The Splunk Threat Research Team (STRT) is responsible for identifying, researching, understanding, and detecting threats — from the Critical Vulnerabilities that dropped on Twitter to those suspicious Powershell scripts that just ran on the Domain Controller — and building detections customers can run today on their Splunk Enterprise Servers. The STRT believes in the power of community contributions, the power of transparency, and the value of “showing your work.” That’s why the STRT makes all of their detections and nightly testing framework freely available to anyone at research.splunk.com and through the Enterprise Security Content Update App on Splunkbase. Today, the STRT builds on that transparency in the culmination of the Detection Testing Blog Series.

How Did STRT Get Here?

Readers following the series have watched our progress towards building a more complete tool to aid in the generation of attack datasets and the development and validation of threat detections. The team’s basic goal is simple — a flexible, scalable, automated detection testing pipeline:

In pursuing that goal, STRT built a set of tools and documented them in a series of blog posts. They’re all worth the read, but in summary:


In the EC2 workflow, testing could get stuck, take days, or the environment could be in an indeterminate state - Courtesy https://eol.jsc.nasa.gov/SearchPhotos/photo.pl?mission=ISS064&roll=E&frame=48480, by NASA, Public Domain (with edits)

Jump to Summer 2021. The STRT team had grown and so had the number of detections being written and updated. At that time, the STRT actively maintained over 600 Splunk Analytics under Splunk Security Content. In response to this growth, a few changes were made to speed up the testing and development workflow. Most notably, instead of regenerating data every time a test was run, raw data was generated once, captured, and stored for replay in the Attack Data repo. The team released and presented the initial idea for Attack Data during Splunk .conf20; this repo has become a powerful tool for STRT testing and a great resource for customers, too! It catalogs gigabytes of freely-available, organized, curated attack data that can be used for learning, testing, and writing novel detections for running on Splunk or other tools. While this change cut detection testing time from 30 minutes per detection to several minutes per detection, there was still room for improvement:

With a fresh look at the strengths and weaknesses of the current system, the STRT decided to iterate one more time!

A Call to Action(s)

The first “aha!” moment occurred during migration from STRT’s legacy CI/CD Solution, CircleCI, to Github Actions. GitHub Actions is powerful, flexible, and free (for public repositories). GitHub Actions can be configured to run when almost anything happens in a repo: pushes, pull requests, comments, issues, and even scheduled events. When an Action runs, it receives full control of a fresh VM called a Runner that exists for the duration of the Action. This is critical for a number of reasons:

  1. Jobs are free to break things! If something doesn’t work (or worse), don’t worry - that Runner will be destroyed when the test completes.
  2. The architecture of GitHub Actions makes it possible to safely compile, run, and test PRs from External Forks before merging. External PRs run in their own environment without any access to the target repository’s secrets or other non-public data, reducing exposure of private API Keys.
  3. It allows STRT to treat testing infrastructure as code, rebuilding the entire environment from scratch on each test.
  4. A Whale of a Good Time
  5. For years, Splunk has published a simple-to-use Splunk Enterprise Docker Container suitable for testing and production environments. Most configuration options, including downloading and installing Splunkbase Apps, can be passed via command line arguments. The detailed documentation for this container can be found here. A fully configured Splunk Container will start in minutes on a local machine or in GitHub Actions.


Breaches are for whales, not your data. Start validating security detections today with Splunk Docker Containers - Courtesy https://unsplash.com/photos/JRsl_wfC-9A, by Mike Doherty (with edits)

Running the Show

Splunk Docker provides the ability to easily start, configure, and destroy Splunk Enterprise servers on-demand, but to tie together the docker-detection-tester.py tool was built. Specifically, this tool does the following:

  1. ESCU package generation
  2. Container setup
  3. Attack data replay
  4. Detection search execution

Since each test runs independently and all the heavy lifting occurs inside of the containers themselves, the attack data replays and detection searches on different containers never interfere with one another! The diagram provides a logical walkthrough of how the tool runs a test.

True Portability

By eliminating AWS (Batch) and moving from EC2 VMs to Docker containers for testing, true detection testing portability was achieved. The options for running testing can be customized to meet any needs. For example, with minimal setup, tests can run on:

Parallelizing GitHub Actions Jobs

While the ability to test in GitHub Actions was perfect for a small number of detections, it was still impossible to test a very large number of detections. Currently Splunk Security Content has over 600 detections. Even if each one takes just 60 seconds to test, the GitHub Actions maximum job execution time is only 6 hours (or about 360 detections). The STRT determined a better, faster way to scale testing using the GitHub Actions Matrix Configuration. This feature is primarily used to test builds against multiple configurations, like different application or operating system versions. For example, a developer may want to test a Python library against Python 2.7, 3.9, and 3.10 on Ubuntu 20.04, Windows Server 2022, and macOS Big Sur. This feature can start up to 256 Runners in parallel.

The GitHub Actions Matrix makes it possible to scale the testing framework by increasing the number of tests executing in parallel. For example, dynamically splitting 600 detections into 10 parallel detection test jobs means just 60 detections per job. This lets detection testing complete in 1/10th of the time and avoids the 6-hour maximum job execution time limit.


10 GitHub Actions Runners means 1/10th the time

To enable parallel testing for scalability, the Github Actions Workflow was broken down into three parts:

  1. Distribute the Detections - The first step enumerates the detections that have been added or modified and generates an ESCU Package containing all of the latest detections and the material to support them. Then, 10 Test Manifests are created, distributing the detections evenly among them. Finally, these 10 Test Manifests and the ESCU Package are uploaded as artifacts in GitHub Actions(artifacts can be accessed by the user and by subsequent GitHub Actions).
  2. Run the Tests - The second step uses the GitHub Actions Matrix functionality to start 10 Runners. The values in the matrix are the filenames of the Test Manifests generated in step 1. Each Runner downloads the Manifest and ESCU Package artifacts generated in step 1 and executes its assigned tests. The results of these tests are written to a file which is also uploaded as an artifact.
  3. Merge the Results - Finally, the results artifacts generated by all 10 Matrix Runners are downloaded and merged into a single file called Summary.json. This file has detailed information about all the tests that were run as well as the configuration of the Splunk server (including the version of the server and the installed Apps/TAs). The Summary.json file is uploaded as an artifact. If all the tests pass, then the workflow is marked as successful. If one or more tests fail, then the workflow fails, generating an additional file which is uploaded as an artifact called DetectionFailureManifest.json. This Manifest file contains only failed detection searches. Users can download this file and run it locally, making it easy to interactively debug any failures!


The final GitHub Actions Workflow - 619 detections in under 50 minutes! Notice the presence of the SummaryTestResults and DetectionFailureManifest files

Final Results

Below is a table summarizing the results of the CI/CD testing system iterations. It includes how long each system took to start, test 1 detection, test 600 detections, and the system’s cost.

Test System
Startup Time
Time to Test 1 Detection
Time to Test 600 Detections
Cost
Use Case
Before AWS Batch
N/A
Manual
N/A
N/A
Deprecated
AWS Batch
N/A
5 minutes
2 days
$0.50 per hour (always running)*
Legacy Solution
Docker-Based (GitHub Actions, 1 runner)
5 minutes
1 minute
600 minutes (max job time 240 minutes!)

Free

(for public repos)**

Test new or changed detections per Commit / PR
Docker-Based (GitHub Actions, 10 runners)
5 minutes
6 seconds (average)
50 minutes

Free

(for public repos)**

Nightly Testing of all detections in repository
Docker-Based (Local Machine, 1 container)
5 minutes
1 minute
600 minutes

Free

(plus electricity)

Initial Detection Development and Troubleshooting
Docker-Based, 32 containers (AWS c6i.32xlarge - 128vCPU, 256GB RAM, io2 Storage)
5 minutes
1.5 seconds (average)
17 minutes

$5.44 per hour

(on-demand)*

On-demand, rapid testing of large changes or new baselines

* https://calculator.aws/#/
** https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions

What’s Next?

The STRT is proud of the progress towards ensuring detections are easy to use and work as expected. Using the new testing framework, STRT has already improved a large number of detections and gained further confidence in the Splunk Security Content is delivered to customers. STRT will continue to improve its quality assurance work by:

Related Articles

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends
Security
12 Minute Read

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends

By analyzing new domain registrations around major real-world events, researchers show how fraud campaigns take shape early, helping defenders spot threats before scams surface.
When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR
Security
4 Minute Read

When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR

Behavioral analytics can spot fraud and burnout. With UEBA built into Splunk ES Premier, one data set helps security and HR reduce risk, retain talent, faster.
Splunk Security Content for Threat Detection & Response: November Recap
Security
1 Minute Read

Splunk Security Content for Threat Detection & Response: November Recap

Discover Splunk's November security content updates, featuring enhanced Castle RAT threat detection, UAC bypass analytics, and deeper insights for validating detections on research.splunk.com.
Security Staff Picks To Read This Month, Handpicked by Splunk Experts
Security
2 Minute Read

Security Staff Picks To Read This Month, Handpicked by Splunk Experts

Our Splunk security experts share their favorite reads of the month so you can follow the most interesting, news-worthy, and innovative stories coming from the wide world of cybersecurity.
Behind the Walls: Techniques and Tactics in Castle RAT Client Malware
Security
10 Minute Read

Behind the Walls: Techniques and Tactics in Castle RAT Client Malware

Uncover CastleRAT malware's techniques (TTPs) and learn how to build Splunk detections using MITRE ATT&CK. Protect your network from this advanced RAT.
AI for Humans: A Beginner’s Field Guide
Security
12 Minute Read

AI for Humans: A Beginner’s Field Guide

Unlock AI with the our beginner's field guide. Demystify LLMs, Generative AI, and Agentic AI, exploring their evolution and critical cybersecurity applications.
Splunk Security Content for Threat Detection & Response: November 2025 Update
Security
5 Minute Read

Splunk Security Content for Threat Detection & Response: November 2025 Update

Learn about the latest security content from Splunk.
Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It
Security
3 Minute Read

Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It

The OneCisco approach is not about any single platform or toolset; it's about fusing visibility, analytics, and automation into a shared source of operational truth so that teams can act decisively, even in the fog of crisis.
Data Fit for a Sovereign: How to Consider Sovereignty in Your Digital Resilience Strategy
Security
5 Minute Read

Data Fit for a Sovereign: How to Consider Sovereignty in Your Digital Resilience Strategy

Explore how digital sovereignty shapes resilient strategies for European organisations. Learn how to balance control, compliance, and agility in your data infrastructure with Cisco and Splunk’s flexible, secure solutions for the AI era.