Benford's Law With Splunk

When you think of numbers for any categorized set of data and the data is familiar, you have a ballpark idea for the size of each number. For instance, when we think of the size of cities in square miles, anywhere from 10 to 100 may be possible. However, if someone tells you their city is 8,000 square miles, you instantly think this statement is false. In fact, the claim is fraudulent. In that regard, what if you could look at the first digit of each number in a large set and decide whether the generated set is valid or artificially manipulated? This blog discusses a well-known way to spot fraud in its tracks with Splunk by applying Benford's law to identify a baseline of expected distributions of the first digit of a set of numbers.

What is Benford’s Law?

Splunker Jeffrey Walzer reminded some of us involved in fraud detection at Splunk about Benford’s Law and applying it for financial services fraud use cases. To recap what is Benford’s Law, if you take the first digit of any large set of numbers that occur in nature such as sizes of rivers (with any measurement unit), baseball statistics, etc, then the distribution percentage for every digit is not going to be equally represented. You would think the digits 1 through 9 would have an 11.1% distribution rate, but that is not the case. In fact, there is more of a probability that the first digit is a 1, 2, or 3 than a 8, or 9. Benford discovered this over a hundred years ago and used statistics from multiple sources to support his “law.” The application of the rules even works with numbers that are not base 10 and can be used to predict the 2nd and 3rd digit of each number in the set. What is the expected distribution of the first digit of a set? Here it is from a Wikipedia article.

Digit
Expected Distribution From Benford’s Law
1
30.1%
2
17.6 %
3
12.5 %
4
9.7 %
5
7.9 %
6
6.7 %
7
5.8 %
8
5.1 %
9
4.6 %

Financial Crime

This would help us with indications of financial crime, because if the natural distribution of the first digit normally follows Benford’s law, then we may assume that a distribution that is totally different may have been artificially manipulated. For instance, in the United States, all transactions above $10,000 must be reported to the IRS by financial institutions. If a group of people at the same bank are constantly doing $8000 - $9999 range transactions, they may be trying to circumvent the need to report the transactions. If the distribution of the first digit of their transactions is heavily skewed towards digit 8 and 9 as opposed to the normal distribution of the population at hand, this may be an artificial manipulation. On the other hand, this could also be a false positive as the people involved may have pay deposits that are in that range and are constantly moving that money out to other institutions. This is why Benford’s law is not a physics law, as it is more of a probability for expected distribution.

Applying This in Splunk

Before applying Splunk commands to your current data, please look at past data in terms of months or even years to get a baseline of distribution patterns for the first digit of a transaction.

The Splunk eval command can be used to get the first character of any string and the top command can be used to get a percentage of distribution for that field. You can also use the convert command to convert this character to a number, but it’s not needed for this purpose. I tried this out with some data sets and here are the results.

Artificial Sample Payments

As you can see, the distribution not only favors the higher numbers, but it omits digit one entirely. I admit that this sample size is small, but more importantly, this is definitely an artificially manipulated dataset as I created it. Obviously, this is not fraud as I created this for illustrating using Splunk to show distribution percentages, but it does show you how easy it is to track the percentages.

Sample ATM Transactions

This example for ATM transactions has the lower number digits dominate the distribution, which is more in line with Benford’s law. In this example, the dataset is a couple of magnitudes larger than the previous one and the random distribution is more akin to real life.

Conclusion

What this really shows is that regardless of whether you believe in following Benford’s Law (and the wise decision is to always think about it as there have been mathematical proofs on why it works), taking a regular percentage distribution snapshot of your transactions in Splunk can give you a baseline of expected behavior. As soon as the distribution percentages change radically from the baseline and there is no obvious explanation, it is worth considering that there is some artificial manipulation of the data indicating that there could be involvement of fraud. This application of Benford’s law or even your own baselines may influence your risk scores leading to higher fidelity of fraud detection.

Related Articles

How Splunk is Helping Shape the Future of Higher Education IT by Tackling EDUCAUSE 2026 Top Issues
Industries
3 Minute Read

How Splunk is Helping Shape the Future of Higher Education IT by Tackling EDUCAUSE 2026 Top Issues

Dive into how Splunk aligns with key priorities highlighted at EDUCAUSE 2025.
Enhancing Government Resilience: How AI and Automation Empower Public Sector Missions
Industries
3 Minute Read

Enhancing Government Resilience: How AI and Automation Empower Public Sector Missions

Splunk helps government agencies boost security and efficiency with powerful, mission-ready AI and automation.
Solving Manual Mayhem in Telecom with Agentic AI
Industries
3 Minute Read

Solving Manual Mayhem in Telecom with Agentic AI

Agentic AI cuts downtime, improves security, and boosts customer experience, and with unified data from Splunk and Cisco, teams can build more resilient operations.
Upgrading to Splunk Enterprise 10.0 and Splunk Cloud Platform 10.0: Key Resources for Public Sector Customers
Industries
2 Minute Read

Upgrading to Splunk Enterprise 10.0 and Splunk Cloud Platform 10.0: Key Resources for Public Sector Customers

Splunk Enterprise 10.0 and Splunk Cloud Platform 10.0 deliver the most secure, stable, and modernized platform for a digitally resilient and compliance-ready future.
Building the Next Generation of Defenders: From the Classroom to the SOC of the Future
Industries
3 Minute Read

Building the Next Generation of Defenders: From the Classroom to the SOC of the Future

Resilience in the AI era doesn’t just happen – it's built one student, one SOC, and one organisation at a time.
Analytics That Work: 3 Approaches for the Future of Contact Centers
Industries
3 Minute Read

Analytics That Work: 3 Approaches for the Future of Contact Centers

Splunker Khalid Ali explains how unified, real-time intelligence connects data, empowers agents, and builds lasting customer loyalty.
Observability + Security: Real-Time Digital Resilience for SLED
Industries
1 Minute Read

Observability + Security: Real-Time Digital Resilience for SLED

Cisco and Splunk are helping public sector organizations build digital resilience.
Digital Resilience for State and Local Governments (Part Two)
Industries
3 Minute Read

Digital Resilience for State and Local Governments (Part Two)

Discover how collaboration—powered by shared data platforms like Splunk—can enhance incident response and overall digital resilience.
Reflections from SIBOS 2025: How will advances in technology (and especially AI) change the financial services industry over the next 5 years?
Industries
2 Minute Read

Reflections from SIBOS 2025: How will advances in technology (and especially AI) change the financial services industry over the next 5 years?

Discover key insights from SIBOS 2025 on how AI, collaboration, and data will reshape financial services over the next 5 years—prepare for rapid change and exciting opportunities ahead.