“Who you gonna call?” - Ghostbusters
Before we get to the main topic of this blog, I would like to provide some historical background on how I got to the topic. Hopefully, this context will be both informative and useful as we try to use call detail records (CDR) for doing good things.
When I started working for Splunk, my first customer use case, at a telecommunications provider, was for monitoring application servers’ time series data for unusual things like error rate storms or high CPU spikes. The application being monitored was for provisioning new customers for cell service. Having worked in a telephony provisioning environment years ago, I understood that multiple steps are involved and any one of them could go wrong due to application server issues, infrastructure issues, or just plain bad customer data input. Even the part of assigning a new phone number or using local number portability has its own nuances. Fortunately, the monitoring with Splunk was the easy part as Splunk was used to ingest the application logs and the infrastructure events for machine utilization in one central place where investigative search and alerting reduced the time to correct issues by a magnitude in time savings.
My very next customer was also in telephony, but in this case, law enforcement was involved. In a dated era where people used calling cards and pay phones, the combination of the two allowed criminals to make demands of their victims, supposedly in stealth mode. Unknowest to the criminal was that the calling card calling number could be traced to a switch location or later on to a cell tower location, as pay phones disappeared, and the card issuing company could help law enforcement by searching for the CDRs with Splunk for the location. This is because the CDR is usually a rich event that has many fields and among them a location field is either present or can be correlated with another event using the phone number. Let me provide you with a sample search to get started:
index=call_logs sourcetype="switch_cdr" Caller=”238092638766918” |table _time Caller place
This is a simple search that will take a phone number and give you the switch or cell tower address it went through as it is part of the CDR. Time is of the essence here as the criminal can easily move out of the location, so the speed of Splunk to search for this data at any given time with millions of other CDRs in the system is important. Although Splunk indexes every word separated by punctuation, it still needs to look at an index data structure to rule out whether an event is in a given bucket of data. A faster way to rule out whether a given string is in a sea of events is to place the metadata of the index itself into a hash table and use a computer science technique called Bloom Filters. Bloom filters will tell you whether the string exists in the hashtable using heuristical techniques and they will not give false negatives. This makes searching for the needle in the haystack, which in this case is a phone number, extremely fast even using hardware that is over a decade old in this use case. By design, Bloom Filters are turned on in the background by default within Splunk Enterprise and Splunk Cloud making the search faster and more scaleable for the high volume of anticipated events.
Another Historical Use Case
About a year later, my knowledge of using CDRs with Splunk led to a new request from a Systems Integrator. This was also for law enforcement, but a little more subtle. What they wanted to do was use the CDR to find out who called whom, for how long, and how often in any given time period. I will show you a sample search for this below, but the real request was to prove that this scaled for millions of events.
In those days, Splunk did not have a general purpose event generator on Splunkbase like it does today. At first, I wrote a Java program to generate demo CDRs on demand, but quickly learned that the nuances of scaling an event generator is a project in itself. I enlisted the help of Splunk software engineer Vainstein, who quickly wrote an industrial strength CDR generator that had multiple options for time, repeatability, and distributions of data. With that, we were able to show the basic requirement for who called whom. In the search below, I’ll show you another simple example where we want to find out if a person has called another phone for a short duration and if they’ve done it at least twice. I’ve left out details from the data and search to make the point easier to understand.
index=call_logs sourcetype=cdr |eval Time=_time|convert ctime(Time)|stats count values(Time) as time by Caller, Receiver|where count>1|fields - count
Notice that the native Splunk _time field needs to be converted to human readable format, instead of using epoch time. The rest of the search is simply grouping using stats for a pair of callers and receivers, counting, and listing initial time stamps of the calls. What would be the use for this? It would be a way to track fraudsters, embezzlers, block leave abusers, ransom requesters, illicit movement of goods between criminal agents, etc. The basic capability to do this at scale was the use case.
Today’s Use Case
I gave, what I hope, was an informative preamble to build up to today’s use case. In this use case in a real situation, a criminal or a group of criminals could perform nefarious acts sequentially in neighboring proximities.
Rather than mention anything morbid, let’s make it a little less dramatic and just see how we can track something like a criminal robbing three stores in three neighboring towns one after another in a short period of time, such as 30 minutes or 1 hour. Most likely, they would have a cell phone in their pocket or car and it could send signals of its whereabouts to its corresponding cell tower. If we were to look for the cell towers involved in the robberies and find common cell numbers that involved all three cell towers, that cell number is a suspect for performing the criminal act. It does not mean the owner of the phone is the culprit as someone could be driving from location to location in that time period for other reasons. However, the list of suspects can be narrowed down quickly and other means can be used to identify with more positivity whether someone is the criminal such as a prior history of such acts. Let’s first look at this with a picture.
In this rather simple diagram, we are showing which cell phones were associated with the cell towers in question within the time of the robberies. As I said before, this leads to suspects, as not everyone here is guilty. Using the information from the cell tower logs, Splunk can be used to correlate which cell phone towers had common phones attached to them. For our example, I’ve used tower places of Greenboro, Newtown, and Roseville as fictitious names. Next, let’s write the search and see the results.
index=cell_tower_logs sourcetype="switch_cdr" |lookup switch_place location OUTPUT place|stats values(place) as towers dc(place) as distinguished_places by Caller |makemv delim="," towers |where distinguished_places=3 AND towers = "Greenboro Noontown Roseville"
Each cell tower would have a cryptic name and we use a lookup file to map the name to an actual place. Next, we use the Splunk stats command to get a list of unique values for the places where the towers are located, get distinct counts (dc) of the number of towers as we are interested in only 3, and use the makemv command to make the list of cell towers into a multi-value field. Finally, we use a where clause to see if the phone number was in three unique locations for the time period and they are the locations where the crimes were committed. Our results show in the table, which has narrowed it down to three cell phones. Because this uses demo data, I refer to the cell number as Caller, although no call has been made.
The one flow in this approach is that it uses an absolute 30 or 60 minute period to look for the information using Splunk’s time picker. How do we know which exact 30 or 60 minute period this occurred as it is not as if the criminal punched in a time clock to indicate his or her presence in each town. The way around that is to use Splunk’s streamstats command, which is a lot like stats, but it allows a running time window to be used as an argument to the command so that any 30 or 60 minute time period from a longer duration will find the results. In this manner, we could select a one hour and 15 minute time period to look for the data and use streamstats to find what we are looking for within a time window as we would know approximate start and end times for the robberies in question.
index=cell_tower_logs sourcetype="switch_cdr" |lookup switch_place location OUTPUT place|streamstats time_window=30m values(place) as towers dc(place) as distinguished_places by Caller |makemv delim="," towers |where distinguished_places=3 AND towers = "Greenboro Noontown Roseville"|table Caller towers distinguished_places
Notice the time_window argument to streamstats. As streamstats is not a transformation command in Splunk, we will use the Splunk table command to transform the results to be like the ones from the previous search.
Splunk’s ability ingest any human readable time series data, index it, provide Bloom Filters on top of the index, and powerful search commands to analyze the data quickly makes it a useful platform for law enforcement, when ingesting CDRs. Telephony forensics becomes a powerful use case on the Splunk platform. If you are not in law enforcement, but have access to CDR events via a call center or other means, consider indexing the events into Splunk for many types of use cases that are easy to implement.