Yes, Virginia, There is a -Santa Claus- Way to Detect Unemployment Fraud

Fraud rates for Unemployment Insurance Benefits (UIB) and Pandemic Unemployment Assistance (PUA) are out of control. In May 2020, Brian Krebs of Krebsonsecurity published two articles detailing fraud that was occurring in several different state’s UIB portals. These states had been warned by the US Secret Service to be on the lookout for this. Reading the articles, the common theme is that many states are missing rudimentary controls for combating fraud. PUA simply exacerbated the fraud problem by offering money to those who formerly didn’t qualify for UIB, giving the criminals even more opportunities to make money illegally.  

This is not a new topic. A fellow Splunker, Chris Perkins took a detailed look at PUA Fraud in this post on LinkedIn, Pandemic Unemployment Assistance Fraud, back in August. It is now January 2021 and, if you aren’t watching the news, the States and their constituents continue to be victims of massive fraud attacks.

Even though many states may be behind in their fraud detection techniques, UIB fraud looks very similar to banking fraud. We can take some cues from bank fraud detection and prevent much of this fraud.

Fraud detection, like security, should take a layered approach. We don’t skip a layer because it is easy to bypass, as the simple layers stop a lot of problems. Glass doors with locks on them, do keep out many bad guys, even though a rock would effectively nullify that locked door.

Also, by slowing down the bad guy (find a rock or move on) and forcing them to adapt, we slow the rate of attack, allowing our analysts to do more.

Detecting Unemployment Fraud in Splunk

Let’s look at some simple searches Splunk Enterprise can do with the data you have to detect fraud; Splunk makes this easy with built-in time series limits and counting operations. Here we will limit our search to a 1 day time span, and look for multiple usernames coming from the same IP address using the dc (distinct count) command:

(NOTE: All data is fictional. Any resemblance to actual persons or ip addresses, living or dead, or actual events is purely coincidental.)

| inputlookup login-data3
| bin span=1d _time |  stats dc(username) as "unique usernames" by ip_address, _time
| sort by "unique usernames"
| table "unique usernames", ip_address
| reverse

Splunk - Detecting Unemployment Fraud - Logins Per Day

Splunk allows us to easily click on data in our reports and drill down – by clicking on the IP address at the top of the list, I can easily see all the different usernames associated.

Splunk - Detecting Unemployment Fraud - Log in per day drill down

I would implement this as a scheduled search, and have it ready as a report for analysts to review every morning. I would refine it over time to ignore IP addresses that are known to have multiple users (libraries for example) behind a single IP.

We know that IP address is not always a good data point as they are often reused or shared, so let’s use device identification as another layer of fraud detection. If you are not familiar with “Device ID”, checkout commercial offerings by iovation (now a TransUnion company) or open source solutions using clientjs or fingerprint.js. All of these offer mechanisms for identifying a device (computer, phone, tablet) that visits your site, regardless of it’s IP address.

We use similar search commands, looking for any device ID with more than 2 users associated with it:

Splunk - Detecting Unemployment Fraud - Device per day

Drilling down on the deviceID we see that the IP addresses have changed. Searching on a single IP address may not have discovered all related accounts.

Splunk - Detecting Unemployment Fraud - Device drill down

We also see that most logins have failed, but one did succeed. The successful login should be scrutinized more closely as this is an account that actually gained access. Although the above searches are looking at logins, we could easily use IP address or Device ID to catch fraudsters applying for benefits with lists of identities.

IP Geolocation is another useful tool; it shows how a physical location corresponds to an IP address. In the current climate, not everyone is living where we expect, but IP geolocation can show us hot spots of activity we would not expect. Splunk has IP geolocation built-in and also includes the free version of a popular geolocation database (customers can always install their own version).  Our search takes some log data with IP addresses; the last 2 lines of code bring the magic. Here we limit ourselves to only IP addresses in the USA, and we map those to US cities:

| inputlookup login-data2
| eval _time=strptime(_time."-0700","%m/%d/%y %H:%M%Z")
| iplocation ip_address| where Country = "United States"
| iplocation ip_address  | geostats latfield=lat longfield=lon count

Splunk - Detecting Unemployment Fraud - IP GEOLOCATION

In looking at the chart, the State of Ohio may expect people to be applying from elsewhere, but why the large bubble in Washington, DC? Hovering and drilldown are easy to do for more information . Clicking on the dot for WDC, we have more details on all activity from that area:

Splunk - Detecting Unemployment Fraud -  IP GEOLOCATION DRILL DOWN

Finally, an interesting investigation method that our customers are discovering, is link analysis. There are many ways to perform link analysis in Splunk, but here is one example that is currently being used for UIB investigations.

Notice our user “azhatley” at the center. They are connected to others in multiple ways: shared phone number for one group, shared IP address for another group, and shared destination bank account with others. There are even 2 more smaller groupings on the right side.

Splunk - Detecting Unemployment Fraud - Link ANALYSIS

I go into more details on link analysis in my .conf20 presentation “Fraud, the Missing Link,” and I will be creating more in-depth blog posts that dive deeper using Splunk for link analysis in the coming weeks.

This is just the starting point when using Splunk for fraud detection. Most customers grow these searches organically finding more fields to build searches on or combining the output of these searches to create higher fidelity alerts. They then layer in enrichment data (Splunk can make API calls to 3rd party services), machine learning (ML), that is free with Splunk, to detect fraud via supervised ML, or detect outliers or odd clusters via unsupervised ML; and then add automation via Splunk Phantom to let the humans focus on the investigation instead of the mundane repetitive tasks.

Happy Splunking!

Resources from this blog to learn more:

  7. - /

Andrew Morris

Posted by