Splunking 1 million URLs

Do you love URLs? I do! This is a great way to have insight about behaviors, catch malware, and help to classify what is going on in a network.

I also have a secret: I collect them. The more I have the happiest I am! So what’s best than Splunk to analyze them?

This is the first post of a bunch on what one can do with URLs and Splunk. Please share in comments war stories, or anything you are doing with Splunk and URLs so I can enrich the upcoming posts.

First, you need to grab the Alexa list, which contains top 1 million URLs in a CSV you can download.

We add the new data source to Splunk:


Splunk automagically discovers the CSV type, and we can start searching for our URLs right away.

Now we need an App to parse our URLs properly, fortunately Splunkbase has many:

blog-app1 blog-app2







If we start looking at our data, we can run a search such as

source=”” | rex field=_raw “\d+,(?<url>\S+)”

We create a field url using our regex, and then use the lookup to parse those URLs and extract useful new fields:


We can now look at the top count for domains without the attached TLD:


Which shows in this case Google, with a count of 145. That means Google appears in the top 1 million most visited URLs multiple times under various TLDs, such as:,,,,,, etc.

If we now look at the top TLDs, it is easy to see com as a top TLD:



Amongst elements extracted, we have one field “url_url_type”, which can give various data, such as ipv6, ipv4, no_tld, unknown_tld, mozilla_tld.


The Mozilla TLD is only to show presence into the Public Suffix List. So whenever an entry appears in both “unknown_tld” and is in the top 1 million urls by Alexa, it starts to get interesting:


This is actually a TLD Romanized as rf, according to what Wikipedia can say about this one, which actually appears in the Mozilla Prefix list as following:

// xn--p1ai (“rf”, Russian-Cyrillic) : RU



But does not have the same encoding, hence showing some improvements that could be made in the lookup. Adding a Unicode to Punycode conversion?


Splunk offers a variety of apps, amongst which that can help analysts to understand more about great insight given by URLs. Happy Splunking!










Sebastien Tricaud

Posted by