Detecting the Unknowns with Phantom and Splunk

Even before Phantom joined Splunk, it was one of the few companies I thought to myself, "Gee this company is going to make a difference." The platform provides tangible steps and integrations to help you do something with your data rather than lock it in a silo. It actually reminded me of the value statements every Splunker holds dear, which is to make your machine data accessible, usable and valuable to everyone. So when the news came around that the Phantom team was joining us, I was naturally ecstatic.

As a security and Splunk enthusiast, I have a home lab which—before Phantom—consisted of the following:

  • Splunk
  • Firewall with a Span port
  • Bro and;
  • Splunk Stream

So where did I begin with my home environment and Security Orchestration, Automation and Response (SOAR)?

A Hypothesis

My hypothesis is that threat actors with interests in my geographical region will compromise hosts in my region to mask their true source location to evade simple detection rules and geo blocking.

Known Caveats

  • If a SYN scan is performed then it is possible to have spoofed IP addresses in the results 

  • IPs are not immutable and can change with new leases, resulting in several IPs potentially being the same actor 

  • The API services that I am querying may not have complete information to determine the risk of an IP 

  • Two weeks may not be long enough to capture all threat actor activity groups

The Beginning

If I am trying to detect unknown hosts (hosts that have no record on intelligence platforms) scanning my network I need to logically remove any host from my results that has been seen before (what I define as “noise”) and has one of the following criteria:

  • Is present on a threat list (SANS, Cymon) or
  • Is a known scanner (Greynoise)

At a high level what I’m trying to achieve is, for every IP that makes an inbound connection to my firewall check to see if it is confirmed “noise”, add the IP to a lookup table in Splunk so future results have these addresses excluded, then finally index the results of each API call in Splunk for later reporting.

My First Playbook

How did I start with automation and orchestration in Phantom? Playbooks! Phantom has an excellent visual playbook editor that allows you to click, drag and connect action blocks to craft Playbooks that can perform complex sequences of operations. Below is my first attempt.

Notice the simplicity of the playbook. I scanned the incoming IP with Greynoise, formatted the output in JSON, posted the data to Splunk, and if the IP was detected as “noise” then close the ticket. If it wasn’t “noise," then keep the ticket open.  

The First Problem

What I soon found out was there are a lot of IPs unknown to this service, to fix this I had to orchestrate other services to provide additional context. This led me to improving my playbook many times over into my fully automated playbook that is detailed below.

So What Does It Do?

  • Perform an IP reputation of the IP with Greynoise 

  • Regardless of the result post the output from Greynoise back to Splunk 

  • Run a Splunk query to update a lookup table 

  • Close the event 

  • If not noise 

  • Sleep a small time to avoid abusing API limits  

  • In parallel 

  • Perform IP lookup query with cymon and dshield and post the output back to Splunk 

  • Perform a IP Lookup in shodan and post the output to Shodan 

  • If either dshield or cymon have detected attacks before from these IPs then mark the IP as noise and run a Splunk query to update a lookup table 

  • If either of the two have not seen any attacks / scans then mark the IP as noise=false and update a Splunk lookup 

  • Finally close the event

The Results

After two weeks of collecting data, what were the results?

73K unique source IPs

How many hosts were unknown to any threat intelligence service or information service?

To paint the picture, I am scanning each connection to my firewall on the external interface in real-time. How could this alter the results? Hosts that are unknown at the time of scanning could become known as the scanning services start to collect additional information on the IP address. To confirm this theory, I ran a subsequent playbook which scanned the IPs I had documented as unknown an additional time. This resulted in 253 hosts changing their “noise” status from false to true.

Out of the 532 distinct IPs that we deemed to be unknown to the internet, 378 of the hosts did not exist on Shodan. Why is this an important fact? Hosts that exist on Shodan are likely to have a publicly accessible port, potential to be compromised and be a bot scanning my range not a targeted attack by a threat group.  

So you can start to get a picture of the real unknowns of the internet. 378 is roughly 5% of the total distinct IPs that we classify as unknown. Crazy!

How much time did I save by automating my research?

Why is this important, Mickey? If you look at the resolved events below and consider the following:

  • Phantom is working for me 24 hours a day, 7 days a week
  • Reviewing and acting on 10,876 events would take a massive team to perform
  • If you had unlimited humans in front of the console responding to these events, mistakes will be made

Let's continue with some of the other facts I found out along the way...

TCP Flags
What were the TCP Flags used by unknown hosts connecting to me.

Top ASNs ordered by Distinct IPs
Out of the unknown hosts which ASN had the most distinct IP addresses scanning my network.

Top Ports / Protocol
What were the most popular ports that were being scanned by these hosts.


The average block attempts on my firewall per day (All traffic).

Repeat Offenders
How many hosts that were marked as unknown, scanned my environment with the same address numerous times over seperate days

Distinct IPs By Geographic Region
A choropleth map showing connections from distinct IP addresses by geographic location.  Dark Blue 0-50 distinct IPs, Green 50-100 distinct ips and Orange 100-150 distinct IPs.

Interesting Headers
As I was querying each IPs against Shodan as they connected to my environment, I was able to grab the application header information. Here are some of the more interesting hosts. :)



So whilst this seems like a simple research project, you can quickly see why Phantom made it more successful and most importantly, accurate. With the help of Phantom I was able to:

  • Shorten the time of my research effort 

  • Improve the accuracy of my research 

  • Respond at machine speed, and
  • Allow me to adapt and change my methodology and research tactics as issues arose.

What Next?

Like what you see and want this capability? Reach out to your local Splunk team for a discussion on next steps with Phantom and Splunk.

Domenico “Mickey” Perre

Posted by