When I was a Sysadmin in charge of servers, they all felt like my children. If they fell down I would pick them back up; if they got a virus I would tend to them. “Log, I am your father” was such a fitting Splunk t-shirt. When I found out I was going to be a real father I asked my wife “What should we name him?” Logically, I went straight to my two favorite places to answer questions, Google and Splunk. I googled for a dataset of names and found the births in the United States for the last ~100 years. I chose to use the state-specific data and put it into Splunk so I could ask questions of it.
We had certain requirements for a name. For example, since we were having a boy we wanted male names. We wanted something that wasn’t super common but also not too rare or trending upward, more than one syllable, etc. Like most searches I write in Splunk, I broke down the questions and iterated.
First, what does this data look like and how many names are we working with?
It looks like we are dealing with 30,274 names and the format is CSV data. Since this is CSV, we can do this search way faster with tstats. The search above took 93.214 seconds to run on my machine. Using tstats it took less than 2 seconds.
The best part is I didn’t have to build a data model or accelerate it. You can run tstats directly on input that uses INDEXED_EXTRACTIONS.
Narrowing down just to the male names results in 13,139 names.
It surprised me that there were so many less female births than female. What surprised me even more is that it hasn’t always been this way. Before ~1936 there were more female births than male, now there are about 5% more males than females.
Could this be an issue with my search, the data source, the method of acquiring the data, something to do with Social Security Administration being founded in 1935, or is it simply the truth? It’s nearly inevitable when I ask a question of machine data that it leads to another question. Naturally I couldn’t ignore this rabbit hole. I’ve learned some things and will share in another post soon. Let’s get back to finding my son’s name.
We narrowed down to male names but still had 13,139 names to choose from. We (my wife and I) agreed we didn’t want a super common or super rare name because they tend to be hard to spell and pronounce. I classified the names into their “percentile of commonness” to figure out a range that satisfies not super common and not really rare.
The occurrences of names drops off quite quickly.
Based on this chart it made sense to eliminate the top 90% of names used and the bottom 5% of names.
Now we were down to a much more manageable 309 names. For whatever reason we also liked names that started with “L” so adding “| search name=L*” narrowed us down to 13 names.
Shout out to Lowell who was number two on the list of 13. The name we liked the most from the list was “Landen”. Our last check was to make sure it wasn’t a name that was trending upwards.
Success. We found the name we wanted!
“Log, I am your father” was the shirt that described my life, now that I am a father it has certainly changed to admin/changeme. Here is little Landen in his admin/changeme shirt.
Happy Father’s Day!