
In late September, 4,000 attendees gathered in Las Vegas for .conf, our annual user conference. Among a host of other features, we introduced Choropleth Maps, a new visualization type in Splunk 6.3. We’re very excited to see the various use cases where Choropleth maps will come in handy. If you already have ideas how to make use of the maps in your dashboards but don’t know how to get started, this article is for you. Also don’t forget to check out the Choropleth Maps documentation to get an idea about the different configuration options.
Under the covers the Choropleth maps make use of Geospatial Lookups, another new feature in Splunk 6.3. Geospatial lookups are what really power using Choropleth Maps on large data sets. Not only do they resolve coordinates into named feature IDs, they also provide the polygons to be drawn on the map.
Lookups, Features, Polygons… Oh my!
Let’s get the terminology out of the way first. In the context of Splunk, you will interact the most with Geospatial Lookups, a new lookup type similar to the existing lookups, but optimized for geospatial analysis. Out of the box you have access to two geospatial lookups that come built-in with Splunk 6.3: the countries of the world, as well as the 50 states of the US and D.C.
Each geospatial lookup defines a set of features, or feature collection. For example, the United States contains 51 features, one for each state and Washington D.C. However, there are more polygons than that. The state of Hawaii consists of eight polygons, one for each island, that share the same feature ID, Hawaii.
What if the two built-in lookups don’t make much sense in your business context? For example, you might only be interested in the neighborhoods of a city. Or you have custom regions that don’t have administative boundaries but rather outline physical features. With any of these use cases and many others, chances are someone has already thought of this. Thousands of feature collections are available for download online. You just have to find them.
Custom Geospatial Lookups
Splunk lets you define your custom geospatial lookups simply by uploading KMZ files that contain a feature collection. In case you’re not familiar with KMZ files, it’s a standard that was introduced by Google and is used by Google Maps, Google Earth, and other mapping tools.
Where do you find these KMZ files? For administrative boundaries, I recommend using the US Census website. Similar resources exist for other countries, such as the UK and Australia. Some of these websites might provide files with KML extensions (instead of KMZ). These still work.
Alright, you’ve made it through here, so I assume you’re serious about creating your own geospatial lookups. If that’s the case and you really feel adventurous, keep reading about how to actually (really truly) create and use these lookups. In the process, you’re going to immerse yourself in ancient geospatial tools and file formats (some of them are from the past millenium!). Unless you’re an expert in GIS software, you might find yourself intimidated by the arcane user experience and odd conventions. It’s not going to be easy, but you might resurface on the other side feeling wiser and armed with enough ammunition to create fantastic maps for your dashboard. Consider yourself warned!
Option 1: KMZ or KML file
Say you want to create a Choropleth map that shows the breakdown of a metric for each congressional district. Let’s walk through how you would go about finding a KML file and defining it as a geospatial lookup. The steps below assume that you have Google Earth installed on your machine.
- Search for “congressional district KML” on Google. The first result gets you straight to the page where you can download the KMLs. Alternatively, you can start with the list of KML files available for download on the US Census website and select “Congressional Districts”.
- Download cb_2014_us_cd114_500k.zip
- Extract the ZIP file, which gives you a similarly named KML file: cb_2014_us_cd114_500k.kml
- When you open the KML file in Google Earth, you will be able to verify that it contains a number of features. That’s what we want to see.
However, when you look at the list of features on the left side of the screen you will also notice that none of the features have names. That’s bad because without names Splunk’s geospatial lookups won’t be able to know which feature to use. But don’t fret, there’s a way to fix that. - * KML is XML at its core. So when you open the KML in a text editor you can peek under the hood to see what information it contains. What we’re looking for is something Splunk can use to identify the feature to instead of the feature’s name.
The file contains a number of <Placemark> tags. (To make the terminology even more convoluted, KML files use Placemark as a synonym to what we’ve previously defined as a feature. Behold the confusing world of geospatial technology!) Looking more closely, you can see that each placemark contains a number of <SimpleData> tags. One of them will have enough information for us to distinguish one feature from the other. AFFGEOID seems like it will fit the bill. How do we point Splunk to use each placemark’s AFFGEOID attribute? We can use XPath to define this. You don’t have to understand too much about XPath. All you need to know is that you traverse XML to the place that contains the attribute you want to use. In our case, the correct XPath is/Placemark/ExtendedData/SchemaData/SimpleData[@name='AFFGEOID']
Before you proceed, jot this down somewhere. We’ll need this when setting up the lookup. - Next, let’s prepare the KML file to be used as a geospatial lookup. For that, simply create a .zip archive containing the KML file and rename it to my_lookup.kmz.
- Upload my_lookup.kmz to Splunk as a lookup table file.
- * Finally, create a lookup definition. Set the type to “Geospatial”. This is also where you point Splunk to where it can find the feature IDs in the KML file. Use the XPath you’ve noted earlier in the process.
* Step 5 and the last part of step 8 are optional if you are using a different lookup and you’re able to see named identifiers in Google Earth (step 4).
While the process above is fairly complicated, it allows you to make use of the many KML/KMZ resources that are out there. In case you can’t find any of these files, there is yet another option to explore:
Option 2: Shapefiles
Before KMZ files or even Google had seen the light of day, another standard had emerged to define geospatial information. In the early 1990s ArcView, one of the early providers of GIS (Geographical Information System) software, first made use of a new format specification and called it a Shapefile.
The geospatial lookups in Splunk 6.3 make use of only a subset of the features that both Shapefiles and KML files offer. These features are common between both formats, which allows us to use the tools that are available to convert between the two. One of these tools is QGIS, an open source and free GIS application. The following steps assume that you have QGIS and all dependencies installed on your machine.
- A Google search for “UK Counties Shapefile” and clicking through a few websites eventually puts you at the Downloads page of the UK Office for National Statistics.
- We’ll select a Shapefile from the list. For the next steps let’s assume we download County_and_unitary_authorities_(E+W)_2014_Boundaries_(Full_Extent).zip. (Note: info from this date is no longer available. Please select a similar appropriate file.)
- Unpack the archive. This creates a number of files with the same name and different extensions. You can ignore most of them, as we’re interested in the Shapefile, CTYUA_DEC_2014_EW_BFE.shp.
- Open this Shapefile in QGIS, for example by double clicking on the filename. You’ll see a map like this one:
- Typically, each Shapefiles contains an attribute table. These tables are mini databases where each feature contains an entry in the table, with a number of attributes in the columns. For this Shapefile, we’re particularly interested in the CTYUA14NM column, which contains the names for each respective feature.
- Now that we know what attribute contains the county names, we can convert the Shapefile into a KML file. Select “Layer” => ”Save As” from the menu bar. You’ll see a dialog similar to the one below.
Select KML in the “Format” dropdown. In order for the KML file to know which attribute to use as the feature ID, we have to point out the attribute that contains a unique identifier. In our case we select the CTYUA14NM attribute to be used as the featureId. - Save the KML file as GB_countries.kml.
- Next, let’s prepare the KML file to be used as a geospatial lookup. For that, simply create a .zip archive containing the KML file and rename it to my_lookup.kmz.
- Upload my_lookup.kmz to Splunk as a lookup table file.
- Finally, create a lookup definition. Set the type to “Geospatial”.
If you’re comfortable using the command line you can skip the QGIS part for the conversion (steps 4 through 7 above), using the ogr2ogr command line interface that is part of GDAL, which is in turn an installation requirement for QGIS.
ogr2ogr -f "KML" GB_counties.kml CTYUA_DEC_2014_EW_BFE.shp -dsco NameField=CTYUA14NM
This command converts the CTYUA_DEC_2014_EW_BFE.shp Shapefile to GB_counties.kml, using the CTYUA14NM attribute as names for the features.
Two Options, Same Outcome
This gives you two different approaches to find feature collections on the web and use them as geospatial lookups in Splunk 6.3. Look out more blog posts that show you even more creative ways to use Choropleth maps.
----------------------------------------------------
Thanks!
Michael Porath