Anonymous? Not so much.

So you want to track who’s coming to your site or network via an anoymous proxy? Not an easy task. Perhaps you even want to block anonymous traffic from visiting your website to thwart attacks? Fear not, Splunk is here.

I obviously know Splunk has all the mechanisms necessary to support this but I was missing an update-to-date list of anonymous proxy server IP addresses. I then remembered, I had recently met a few smart guys from Aplura at a Splunk Live! event where they showed off a custom command they built: getwatchlist. As they put it, getwatchlist “is a custom search command for Splunk which will return a CSV formatted list from a URL”. This is the first piece of the puzzle: an easy-to-use mechanism for getting remote text-based data. The question was… is there a public list of known anonymous proxies?

A little Googling goes a long way. Within seconds, I had found which contains quite a few text-based lists of known proxy servers. Oh and updated daily! Perfect! Now how do I make the two work together?


Step 1: Install the getwatchlist custom command

  1. Download it here:
  2. Install the app
  3. Restart (or start) Splunk

Step 2: Retrieve the anonymous proxy list via getwatchlist

Let’s first make sure we can bring in the list via getwatchlist:

| getwatchlist

You should see something like following and notice that there’s some noise we need eliminate in order to get to a true list of known anonymous IP addresses.

Getwatchlist makes this extremely straightforward. We can simply instruct it to ignore the first line and any line prefixed with #’s and also remove the port numbers (:nnnn):

| getwatchlist ignoreFirstLine=t comment=* delimiter=:

.. which now returns:

Voila! An up-to-date list of all known anonymous proxy server IP addresses.

Step 3: Leveraging Splunk lookups

Now that we have a list of IPs, we need to be able to use it against our existing data. In Splunk, that means creating a lookup:

  1. Let’s take the same search, add a field that denotes this is an anonymous proxy (is_anonymous=true) and output its results to a CSV file for use in a lookup (outputlookup):
  2. | getwatchlist ignoreFirstLine=t comment=* delimiter=: is_anonymous=true | outputlookup anonymous_ips.csv
  3. Go to Manager > Lookups > Lookup table files. You should see an entry for the newly created “anonymous_ips.csv” file. If not, something went wrong and leave me a comment below.
  4. Go to Manager > Lookups > Lookup definitions >  New. Give it a name like “anonymous_proxy_ips“, select File-based and select your “anonymous_ips.csv” file. It should look something like this:


Step 4: Show me the money!

You should now be able to run a search similar to this one where we explicitly tell Splunk to lookup data:

sourcetype="access_combined" | lookup anonymous_proxy_ips ip_address AS clientip | where is_anonymous="true"

It’s very possible that you have no matches, so what I like to do is to plant a known IP address into the anonymous_ips.csv file so I can verify that everything works well. You can find that file under %SPLUNK_HOME%\etc\search\lookups (Windows) or $SPLUNK_HOME/etc/search/lookups (*NIX).

Step 5: Splunk-a-licious

If you’re a Splunk expert, you’ll know that there’s plenty you can do from this point on. If not, here are a few pointers:

  • Automatically ensure that you have the latest anonymous proxy IP list by scheduling a search to run at every day at midnight via a scheduled saved search.
| getwatchlist ignoreFirstLine=t comment=* delimiter=: is_anonymous=true | outputlookup anonymous_ips.csv
  • Instead of explicitly specifying Splunk to do lookups, you can turn on automatic lookups. You can do this from Manager > Lookups > Automatic Lookups > Add New. It should look something like the screenshot below (I used sourcetype=access_combined).

Now you can skip the “lookup” command and just search for:

sourcetype="access_combined" is_anonymous="true"
  • Alert or automate the way to deal with anonymous proxy-based traffic. Splunk saved searches can be used for alerting via email, create an RSS feed or even to kick off a shell script which can do things like block those IPs on a firewall, lock a user account or even have Splunk create a Remedy ticket. I’ve done a POC where we output the data to a CSV file which gets picked up by another tool which in turn blocks those IPs on hundreds of edge devices. The possibilities are endless!

Hope you’ve found this useful. Remember, these techniques can be used in a variety of ways for all sorts of use cases. Feel free to share your experiences with us!

As always, Happy Splunking!

Posted by