Splunking Foursquare

I tend to travel quite a bit in my role at Splunk.The other day I was wondering to myself how far I had traveled in the last week , the last month , the last year. It just so happens that I am a Foursquare user , not because I like to hoard mayorships across the globe , rather I tend to use Foursquare checkins to help me remember where I have been.Now you get where I am gong with this , because “where have I been” actually means “a lot of cool location meta data” that I can have fun with.

I was looking around online for a simple tool that could hook into Foursquare to tell me how far I have traveled and where I have been and visually geo plot this for me.Nothing that I tried really appealed.Fortunately I have all the tools at my disposal to very simply do what I want myself.

Getting at the Foursquare checkin data

Foursquare has a comprehensive REST API that makes it easy to get at your data.

In particularly your checkins.

In order to poll your checkin events , you first need to :

1) register a Foursquare App

This will generate your CLIENT ID and CLIENT SECRET which you’ll need in the next step.

2) acquire your OAUTH2 token

You will need the returned token for when you setup the REST input in Splunk.

Setting up the REST input in Splunk

To poll the Foursquare REST API from Splunk I am using the  REST API Modular Input , freely available on Splunkbase Apps.

This is a completely generic modular input for polling any REST API , so we can use this with Foursquare.

Setting up the input stanza is very simple. You just need to provide the Endpoint URI and OAUTH2 token.

I have also specified a custom response handler. The reason for this is that the JSON response that Foursquare returns is 1 single document with all the checkin events aggregated. I want to parse this JSON document and separate each checkin out into an individually indexed event in Splunk.

So the REST API Modular Input provides an extension mechanism for adding in any custom request/response functionality you may require over and above the generic functionality provided out of the box. The custom response handler I have added in for Foursquare checkin responses is very simple.

Searching over the checkin data

Now , if everything has gone to plan , you will have all your Foursquare checkin events being indexed in Splunk in JSON format.

If we drill down on the venue.location field , we can see the geo location data , which is what I am interested in getting at.

Calculate the distance between checkin events

With this latitude and longitude data , I can use this to calculate the distance between 2 geo locations. But how can I perform the underlying trigonometry that takes into account the earth’s curvature(a spherical equation) to derive an accurate distance metric ? Well there is already a well known algorithm for this know as the Haversine formula.

Wouldn’t it be nice if there was a Splunk search command that used the Haversine formula to take 2 points on the globe in  latitude/longitude format and gave me the distance as an output field.

Bazinga ! The Splunk community has spoken , and there is a freely downloadable add-on on Splunkbase that does just this.

Let’s get visual

In this search command I am basically extracting the latitude and longitude from the current event and the previous event & applying the haversine search command to output the distance between these 2 events. You’ll notice that I am also “deduping” just as a safeguard incase my polling from Foursquare has overlapped and pulled in some duplicate checkin events.

index=main sourcetype="4sq_checkins" |dedup id| sort - createdAt | rename as "lat" | rename venue.location.lng as "long" | streamstats current=f global=f window=1 first(lat) as next_lat first(long) as next_long first( as pointB_venue by sourcetype | strcat lat "," long pointA | haversine originField=pointA units=mi inputFieldLat=next_lat inputFieldLon=next_long outputField=distance_miles | strcat next_lat "," next_long  pointB | rename as "pointA_venue" | table pointA pointA_venue pointB pointB_venue  distance_miles | where distance_miles > 0

Now I can easily calculate my total distance traveled over a given period of time by piping through into a stats command to sum the distances up.

....| stats sum(eval(ceil(distance_miles))) as "Total Distance Travelled (Miles)"

And I can also output a “_geo” field that allows me to plot my checkins on a map from the popular Google Maps App on Splunkbase.

index=main sourcetype="4sq_checkins" |dedup id| sort - createdAt |strcat "," venue.location.lng _geo

But don’t stop there

There is a world of data available out there that the Splunk REST API Modular Input can help you tap into.

Check out this recent blog.

Go and tap into this data , search over it , correlate it , turn it into powerful analytics and visualizations.

As we say , “Your Data , No Limits”.

Damien Dallimore

Posted by