Random Words on Entropy and DNS

During my last blog post, I mentioned that I would delve more into how to detect subdomains with relatively high entropy. But first I think it is important to discuss WHAT is entropy; WHY do I care if a domain or subdomain has high entropy; and finally, HOW you can use entropy in Splunk to find potentially bad things.

What?

So, what does entropy mean? For the purposes of computer science, I tend to use the definition of entropy as “… a measure of uncertainty in a random variable” [1]. For most things in computer science, entropy is calculated with the Shannon Entropy formula invented by Claude Shannon:

shannon_equation

In other words (since if you are still reading this section, that formula meant as much to you as it did to me), the more random a string is, the higher its calculation of randomness (or rather “entropy”). This calculation is often referred to as a “score” of entropy. To illustrate what this “measure of uncertainty” looks like in the real (ha!) cyber world, lets calculate the Shannon entropy of the domains listed below:

As seen in the examples above above, a domain with lower levels of randomness (aaaaa.com and google.com) have correspondingly lower entropy scores than the long random domain A00wlkj—(-a.aslkn-C.a.2.sk.esasdfasf1111)-890209uC.4.com.

Why?

Why should you care about entropy? One good reason you should care about entropy is that it can help you detect malware and web exploits that make use of domains (and subdomains) that were created using a domain generation algorithm (DGA). As I discussed in my previous blog post, malicious actors will use a domain generation algorithm to create random looking domains and subdomains using some sort of “key” or “salt” that only they can decode. This new DGA domain can then be used for future malicious campaigns. Many different varieties malware [2] or other threats to your network use these DGA domains but some of the most famous include worms like Conficker [3] and web exploits like Blackhole Exploit Kit [4]. Since these domains are randomly generated (and may only be up for a short amount of time) [5], it makes it extremely difficult for network defenders to block them using traditional methods like blacklists.

How?

When Anton Chekov the famous Russian playwright said: “only entropy comes easy”, I believe it is safe to assume he was an avid user of the Splunk app “URL Toolbox”[6]*. URL Toolbox can be used to split a URL or DNS query apart and calculate Shannon entropy on one of its corresponding fields in Splunk. Since you can’t use traditional block lists (the domains are constantly changing) to detect DGA domains, calculating entropy on those fields helps you detect possibly malicious domains that would otherwise get lost in the data. It should be noted that this is not a perfect method. Some legitimate domains (especially content delivery network (CDN) domains, news sites, streaming video sites, and Facebook) will be extremely long and have high entropy. When you start looking at these searches, spend some time manually reviewing your data and add those legitimate domains to a whitelist domain lookup table so that you can filter them out from your results.

Now lets look at a couple of example queries to see how to look for domains and sub-domains with URL Toolbox:

*Please note I am not a literature professor and this may not be an accurate statement

Domains:

tag=dns
| `ut_parse(query)`
| lookup FP_entropy_domains domain AS ut_domain
| search NOT FP_entropy=*
| `ut_shannon(ut_domain)`
| search ut_shannon > 4.0
| stats count by query ut_shannon

With this search we are looking at Common Information Model (CIM) compliant DNS queries via the “tag” field, but you could run this against Stream, Bro, or Host DNS sourcetypes. You could even run this against an http request from a proxy log if you wanted. Then (as discussed above) we remove any false positives by adding those domains to a “FP_entropy_domains” lookup table. Following that step, we then calculate the level of entropy in the field “ut_domain” (which is the base domain of the query that ut_parse created earlier). Finally, we tell Splunk to only display domains (ut_domain) that have an entropy score higher than 4.0. This is an arbitrary score that I created for this data set, but may need to be adjusted for your environment (lower means more false positives and higher means more false negatives). Try to find the crossover error rate (CER) that works best for your network!

Splunk search domains high entropy

Subdomains:

tag=dns
| `ut_parse(query)`
| lookup FP_entropy_domains domain AS ut_domain
| search NOT FP_entropy=*
| `ut_shannon(ut_subdomain)`
| search ut_shannon > 4.5
| stats count by query ut_shannon

This is identical to the above search but instead of looking for domains with high entropy, we are looking for SUB domains with high entropy. You could also have some fun by combining the dynamic DNS lookup table from my last blog post with this search! That would be especially good at finding APT malware that is beaconing home to dynamic DNS providers.

Splunk search subdomains high entropy

Conclusion:

Domains and subdomains with relatively high entropy are great indicators of malicious behavior on your network. Take these searches and start hunting! I can almost certainly guarantee you will find something. Happy Hunting :-)

[1]
Jayasree, N., and P. P. Amritha. “A Model for the Effective Steganalysis of VoIP.” Advances in Intelligent Systems and Computing Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 2014, 379-87.

[2]
“Domain Generation Algorithms (DGA) in Stealthy Malware – Damballa.” Damballa. March 5, 2012. Accessed September 28, 2015. https://www.damballa.com/domain-generation-algorithms-dga-in-stealthy-malware/.

[3]
“Introduction.” Conficker Working Group. Accessed September 28, 2015. http://www.confickerworkinggroup.org/wiki/pmwiki.php/ANY/Introduction.

[4]
“OpenDNS Security Research:.” OpenDNS Blog OpenDNS Security Research Blackhole Exploit Kit DGA Analysis Comments. July 10, 2012. Accessed September 28, 2015. https://blog.opendns.com/2012/07/10/opendns-security-team-blackhole-exploit/.

[5]
Bilge, Leyla, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. “Exposure.” ACM Transactions on Information and System Security TISSEC ACM Trans. Inf. Syst. Secur., 2014, 1-28.

[6] Le Roux, Cedric. “Documentation.” URL Toolbox. Accessed September 28, 2015. https://splunkbase.splunk.com/app/2734/.

Related Articles

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends
Security
12 Minute Read

Predicting Cyber Fraud Through Real-World Events: Insights from Domain Registration Trends

By analyzing new domain registrations around major real-world events, researchers show how fraud campaigns take shape early, helping defenders spot threats before scams surface.
When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR
Security
4 Minute Read

When Your Fraud Detection Tool Doubles as a Wellness Check: The Unexpected Intersection of Security and HR

Behavioral analytics can spot fraud and burnout. With UEBA built into Splunk ES Premier, one data set helps security and HR reduce risk, retain talent, faster.
Splunk Security Content for Threat Detection & Response: November Recap
Security
1 Minute Read

Splunk Security Content for Threat Detection & Response: November Recap

Discover Splunk's November security content updates, featuring enhanced Castle RAT threat detection, UAC bypass analytics, and deeper insights for validating detections on research.splunk.com.
Security Staff Picks To Read This Month, Handpicked by Splunk Experts
Security
2 Minute Read

Security Staff Picks To Read This Month, Handpicked by Splunk Experts

Our Splunk security experts share their favorite reads of the month so you can follow the most interesting, news-worthy, and innovative stories coming from the wide world of cybersecurity.
Behind the Walls: Techniques and Tactics in Castle RAT Client Malware
Security
10 Minute Read

Behind the Walls: Techniques and Tactics in Castle RAT Client Malware

Uncover CastleRAT malware's techniques (TTPs) and learn how to build Splunk detections using MITRE ATT&CK. Protect your network from this advanced RAT.
AI for Humans: A Beginner’s Field Guide
Security
12 Minute Read

AI for Humans: A Beginner’s Field Guide

Unlock AI with the our beginner's field guide. Demystify LLMs, Generative AI, and Agentic AI, exploring their evolution and critical cybersecurity applications.
Splunk Security Content for Threat Detection & Response: November 2025 Update
Security
5 Minute Read

Splunk Security Content for Threat Detection & Response: November 2025 Update

Learn about the latest security content from Splunk.
Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It
Security
3 Minute Read

Operation Defend the North: What High-Pressure Cyber Exercises Teach Us About Resilience and How OneCisco Elevates It

The OneCisco approach is not about any single platform or toolset; it's about fusing visibility, analytics, and automation into a shared source of operational truth so that teams can act decisively, even in the fog of crisis.
Data Fit for a Sovereign: How to Consider Sovereignty in Your Digital Resilience Strategy
Security
5 Minute Read

Data Fit for a Sovereign: How to Consider Sovereignty in Your Digital Resilience Strategy

Explore how digital sovereignty shapes resilient strategies for European organisations. Learn how to balance control, compliance, and agility in your data infrastructure with Cisco and Splunk’s flexible, secure solutions for the AI era.