Security

December 07, 2022

2 Minute Read

Visualising a Space of JA3 Signatures With Splunk

By Josh Cowling

One common misconception about machine learning methodologies is that they can completely remove the need for humans to understand the data they are working with. In reality, it can often place a greater burden on an analyst or engineer to ensure that their data meets the requirements, cleanliness and standardization assumed by the methodologies used. However, when the complexity of the data becomes significant, how is a human supposed to keep up? One methodology is to use ML to find ways to keep a human in the loop!

Dimensionality reduction methods such as PCA, tSNE and UMAP allow us to take complex, encoded datasets and reduce them down to diagrams that allow us to bring human intuition and understanding back into our processes.

In January at SANS CyberThreat2022(3), I will explain how these techniques can be applied to JA3 TLS Signatures. Collecting TLS signatures can help you to keep track of known, unknown and malicious software. In addition to this presentation, I'm working with the SURGe team at Splunk to build on our work of investigating the use of JA3 signatures to mitigate Supply Chain attacks.

In short, these dimensionality reduction techniques allow us to take a set of JA3 hashes and some of the information comprising these signatures them and turn them into a map to show the the space of software communications in a dataset:

In applying tSNE to generate this Petri dish-like representation of JA3 signatures from the dataset available at ja3er.com, we see a number of structures that emerge when we plot these signatures in a 2D space. Every blue point in this diagram is a unique signature. Many signatures together form the clouds and clusters seen in this diagram. Signatures that are similar are close together and those that are different are forced apart, creating a simple and intuitive 2D representation of a very complicated dataset!

By pulling in some labels for this space, we can start to identify regions of this map where malicious software congregates and use this as a visual aid when threat-hunting or observing new and recurring traffic in our environment. This diagram shows some labeled malicious JA3 signatures (red) against the ja3er.com dataset.

So, if we see lots of activity near these malicious points in the future, that might be worth examining, since those communications will share a lot of the same structure and features as these malicious communications.

It’s also possible to generate maps of smaller spaces where we compare and contrast the behaviors of multiple hosts. The following example uses UMAP to visualize the clusters of behaviors seen across five different hosts on a single day. Points in clusters or close to others represent either identical or very similar JA3 signatures, and we can clearly see anomalous behavior on the green host as it sits in its own separate cluster. Could it be that this host is using different, unpatched, out of date or malicious software? Time to investigate!

OK, cool. But what can I do with this in Splunk?

I’ve implemented an example of using JA3 signatures to classify host TLS behaviors as an example in the latest version of Splunk’s App for Data Science and Deep Learning (DSDL). So feel free to grab it and take a look.

However, I believe that these sorts of advanced dimensionality reduction techniques are likely to be useful well beyond this simple example. We can hopefully take some of the more general but very complex datasets we see often in security and make them far more accessible. If you’d like to dig in further or just chat about what’s possible, please feel free to reach out to me on LinkedIn.

Josh Cowling

Josh is a technologist, consultant, and entrepreneur based in London. Holding a PhD from Durham University's School of Engineering and Computing Sciences, he has wide experience spanning start-ups and enterprises in research, engineering, consulting, and pre-sales roles. While his background includes research, Josh is primarily focused on understanding, developing, and deploying new technologies that solve real problems and deliver tangible value. Connect with Josh on LinkedIn, especially if you have an interesting challenge in domains like cybersecurity, Splunk, data science, or machine learning.

Security 2 Min Read

Orchestrate Framework Controls to Support Security Operations with Splunk SOAR

Learn more about how to identify use cases for automation and dive deeper into the five steps of designing security workflows around framework regulations

Security 2 Min Read

Introducing ATT&CK Detections Collector

Automate and simplify finding detections against ATT&CK techniques used by adversaries with Splunk SURGe's open-sourced project, ATT&CK Detections Collector (ADA).

Security 6 Min Read

Investigating GSuite Phishing Attacks with Splunk

Splunk Threat Research Team (STRT) recently observed a phishing campaign using GSuite Drive file-sharing as a phishing vector. Learn more and deploy detections to prevent them in your environment.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram