TIPS & TRICKS

Hey Baby, Where’s Your Number?

Just when you thought you were safe from bad pick up lines, another blog post had to come along. This time I will answer the question, “How to map phone numbers to geographic locations?” Using a simple lookup in conjunction with the Splunk amMap or Google App, we will plot telephone numbers on a map.

Introducing the location of the Splunk North American Sales Team by their Area Code

Since their instatement by AT&T and Bell Laboratories over half a century ago, we have come to know and love area codes. Some of us would even consider ourselves area code snobs. For example, as a citizen of Los Angeles, would you answer the phone if a number outside the following area codes appeared on your caller ID?

310 — Santa Monica, Malibu, Beverly Hills (The Westside)
323 — Hollywood
213 — Downtown LA
818 — The Valley

Maybe not. What was once a mapping system for geography and population density has evolved to encompass even social status.

What does this mean for your IT data? Let’s find out.

The Setup

I am providing a lookup table with many such entries:

areacode,city,region,country,latitude,longitude
415-200,San Francisco,CA,US,37.7750000,-122.4183333
646-200,New York City,NY,US,40.7141667,-74.0063889
786-200,Miami,FL,US,25.7738889,-80.1938889

To configure the lookup for any data containing phone numbers:

  1. Download the amMap or GoogleMaps App.
  2. Download and unzip the areacode-latitude-longitude lookup table into the lookups folder of your app or etc/system/lookups.
  3. Apply the Splunk configuration below (in etc/system or in a separate app in etc/apps).
  4. Make the configuration accessible either using a .meta file in your app or the Manager.
  5. Restart Splunk.
  6. If you are using Google Maps, run the search from the App just like any other search.
  7. If you are using amMap, you will need to populate the flash map with the search provided below.

Entry for local/props.conf

# The lookup and field extraction are tied to a sourcetype
# called "phonedata" in this example.  Your data likely has
# a different sourcetype, so please adjust the stanza below.

[phonedata]
LOOKUP-ac = GeoAreaCodeLookup areacode OUTPUTNEW latitude as _lat, longitude as _lng, region, city, country
REPORT-ac = getareacode

Entries for local/transforms.conf

[GeoAreaCodeLookup]
filename = areacode_latitude_longitude.csv
max_matches = 1
min_matches = 1

# This entry below assumes phone numbers have the format 123-456-7890
# and are prefaced by the term "phone_no=".   The extraction finds the first
# 6 digits (123-456) and creates the field "areacode".  Please adjust the REGEX
# field to fit your data.  This extraction is required for the mapping to work.
[getareacode]
REGEX = phone_no="?(\d{3}[-|.]\d{3})
FORMAT = areacode::$1

Sample default.meta (if configuring as an App):

[]
access = read : [ * ], write : [ * ]
export = system

[props]
export = system

[transforms]
export = system

[lookups]
access = read : [ * ], write : [ admin ]
export = system

Search to populate amMap:

… | stats count by areacode
| eval count_label=”Event”
| eval iterator=”areacode”
| eval iterator_label=”Event”
| eval zoom=”zoom=\”234%\” zoom_x=\”53.03%\” zoom_y=\”-57.61%\””
| eval movie_color=”#FF0000″
| eval output_file=”home_threat_data.xml”
| eval app=”amMap”
| lookup GeoAreaCodeLookup areacode OUTPUT latitude as client_lat, longitude as client_lon, city as client_city, region as client_region, country as client_country
| fillnull client_country value=”United States”
| fillnull client_city value=”San Francisco”
| fillnull client_region value=”CA”
| mapit

I will work on packaging this as an add-on on SplunkBase.

Behind the Scenes

I was not able to locate the area code-city listing as a free resource on the web. There are several commercial products which are licensed with guarantees on accuracy and regular updates. As the demand for mobile phones, DSL lines and VOIP endpoints will only increase, area codes will split and new ones will be added. I can understand why this data comes as a premium. It is a mess and must be a bear to maintain and update.

The data for this exercise was compiled from a free online listing, so you should expect some holes. These are discussed below. Regardless, the listing provides cities for 163,619 area codes of the 6-digit variety. In addition, I’ve incorporated the basic area code-region mapping for occasions when only the 3-digit area code is desired. For these records, the U.S. cities will have the state in the region field and all other cities will have the country. These records will not contain any data for latitude or longitude.

The city-coordinate listing is available as a free resource. MaxMind World Cities Database to the rescue! This database is updated yearly and contains data for major cities.

With 2 free listings you’d think I could finish up and wash my hands. Not so fast. Somewhere along the way a python script was written to finagle all the data into a single table as I was not able to configure Splunk to use the results from Lookup A as input to Lookup B for the same data source. Transitive lookups are not supported currently.

Lookup A: area code –> city
Lookup B: city –> latitude + longitude

In any case, there are nuances specific to these files which require conditional logic to sort out, making this transitive lookup infeasible anyway. Tragic? No. Painful? A little. The variations in city names was enough to make me want to eat my atlas. Did I say eat? I meant shred. Here are a few examples of variations between the listings:

St –> Saint
D’ –> d’
New York City –> New York
West/East/North/South/Central Foo City –> Foo City
Foo City Bridge/Island/Beach/Meadows/Airport –> Foo City
Livingston (Sumter) –> Livingston
Ansonia-Derby –> Ansonia

And then there were the many personalities of French Canadian cities:

Sainte-Lucie-de-Doncaster
Sainte Lucie de Doncaster
Sainte-Lucie-Doncaster
Sainte Lucie Doncaster
Sainte-Lucie
Sainte Lucie

Oh, those Canadians. Gotta love ’em, eh? As you can imagine, all these variations made for some interesting scripting.

Limitations

While we now have a free combined lookup table, it is not perfect, nor is it complete. The melding of the area code listing and city latitude/longitude listing was not lossless. Over 1% of the cities in the area code listing could not be found in the city-latitude/longitude database (many of these in Canada and Texas). This means events containing area codes without coordinate data will not be graphed. Or you can complete the table with automated/manual queries of free web-based coordinate lookup services. This is an exercise left to the reader. :)

Also, in this exercise we only used numbers in NANP, the North American Numbering Plan, covering The United States, Canada and the Caribbean. Other regional number plans exist, such as ETNS (European Telephony Numbering Space), but they are not incorporated here.

Finally, several factors dilute the value of this kind of mapping. As you’ve probably already grasped, with the advent of mobile phones, portability laws and VOIP, many telephone numbers have a lower probability of being used in the actual geographic location of its area code. Myself, I will confess to being an area code snob and plead guilty to retaining a 415 San Francisco area code while residing in and conducting Splunk operations from Los Angeles. I’m sure you can think of plenty of other reasons an area code might wander from its geographic mapping. This wanderlust undoubtedly introduces an unknown margin of error to telephone mapping analysis.

Like, Whatever

So what does Ludacris, rap and maps have in common? Area codes! While this shouldn’t be a surprise, what is intriguing is the depth of insight achievable by analyzing a map of this relationship. Warning: this content is not suitable for those who find gangsta rap uncool. My point is, despite inherent problems and potential inaccuracies, a map is just another lens on your data and may just produce some useful insights. It’s simple to setup and try out (as are many similar lookups and mappings). Let me know what you discover! Just don’t call me from a dubious area code. 😉

Vi Ly
Posted by

Vi Ly

Join the Discussion