Big Dating: Using Splunk to Fall in Love

Most people think of Splunk as software for IT, business, or security use cases. Last year at our annual user conference, .conf2017, we shared a use case that's a little more obscure. Myself and Keegan Dubbs used Splunk to analyze our dating profiles to find out if we could use big data to optimize finding a soulmate.

We conducted a social experiment on one of the most popular online dating platforms, Bumble, to answer the age-old question: What is the best pickup-line/opener? Along the way, we found other insights that would optimize our ability to find the perfect match (and had a lot of fun, too!).

For our experiment, we used Bumble as a platform because it requires females to make the first move.

We narrowed our list of openers to the three below:

Image 1: The three openers we tested during our social experiment

Our hypothesis was that “Hey ___, how’s it going?” would get the most responses because it is personalized, engaging and simple. Of course, we set a variety of control variables, like what time of day we’d swipe and most importantly made our profile tagline something Splunky: “I like big data and I cannot lie.”

Getting the data was a bit of an adventure, but once we were able to get the text (.txt) files of our chat records from Bumble, we were able to bring the data into Splunk to do some analysis.

Image 2: Sample image of conversations from the .txt files

We uploaded the data as .txt file, but if you look at the screenshot in Image 2, there was a custom format the messages appeared in, including multi-line events and 40 custom dashes at the end of each conversation. It took several tries to extract the fields properly. Each time we made configuration changes we had to wipe the index (using the clean command) and then reindex the files to see if our extractions worked. Splunk does not allow you to permanently remove individual events from an index, which brings us to one of our key learnings—when bringing new data into Splunk, bring it into a test index so that you can clean your index anytime you need without having to worry about wiping other data that may be important to you. Additionally, a best practice is to use a dev instance (Splunk offers a free 50GB/day Developer’s license for a renewable 6-month term).

For our .conf files, it was very challenging and inefficient to run a single regex to pull out the fields we wanted, so we split them apart and wrote two extractions instead—getData and candidate_number. If the data was more complex, we might have chosen to do some preprocessing before bringing into Splunk. See below.

LINE_BREAKER = ([\r\n]+-{40})
REPORT-guyInfo = getData
EXTRACT-candidate_number = ^[^:\n]*:\s+(?P<candidate_number>\d+)


REGEX = (?:\t(?P<Sender>\w+)\s\((?P<message_time>[-\s\d\:]+)\)\:\s(?P<message>.*?))(?:[\r\n]|$)
FORMAT = Sender::$1 message_time::$2 message::$3
MV_ADD = true
Image 3: Splunk Dashboard of our Bumble Data

So after analyzing over 120 conversations, what did we find out? Well, you can check out the dashboard above, which shows that “Ur Hawt” was the top opener with a 68% response rate. SURPRISED?! We were. Turns out humans are more vain than we thought. Another thing we found out, don’t be afraid of rejection. Keegan and I’s rejection rate (i.e. no responses) was an average of ~40%. It may have looked like we were having success but the truth in the matter is, everyone gets rejected at some point. Welcome to the wonderful art of dating. You miss 100% of the shots you don’t take, right?

Image 4: SPL used to calculate the Success Rate of the pickup-line “Ur Hawt”

The cool part about bringing our data into Splunk is that we were able to find other insights beyond our original test hypothesis. We found out the average time to respond was over 581.74 minutes (~9.7 hours!), which is slightly horrifying. But we were also able to use a ton of cool custom visualizations from Splunkbase, like a Heat Map and a Word Cloud of the most frequently appearing names. Another shocker—Matt and Michael made the top 10.

Image 5: Splunk Dashboard panel using Custom Visualizations of Bumble Data

At the end of the day, we found out that dating is hard work even with the help of software. We also learned to think bigger than just IT and security of how Splunk can help you solve any kind of problem. After all, Splunk’s mission statement is to make machine data accessible, valuable, and usable to everyone.

Finally, we encourage you to give Splunk a try with your own data sources. Check out the Splunk free trial, and get going finding the soulmate of your dreams!

If you’re looking for more info on our session, check-out the Big Data Beard Podcast we did at .conf2017 with Cory Minton. Or check-out the replay and slides from our "Big Dating: Using Splunk to Fall in Love" presentation at .conf2017.

Happy Splunking!

Kelly and Keegan

A special thanks to Robert Christian from Splunk and the Bumble Support team for helping make this project come to life!

Kelly Kitagawa

Posted by