I recently posted a blog about Splunking my golf swing and afterwards a co-worker asked if I could Splunk the NBA finals. He gave me some NBA data and while on a flight today I decided to look into the data a little with Splunk. I don’t know very much about basketball and you all probably have way better questions to ask of the data; nevertheless I gave it a shot on my flight. Note: CLE=Cleveland and GSW=Golden State Warriors
Each file had the date of the game and who played where as the filename.
Since it was csv I imported it as such and set timestamp based on the date and “elapsed”.
In less than a minute Splunk had indexed the entire 2015 NBA regular season plus the recent playoff games. It was time to ask questions of the data which had a plethora of fields available since it was play-by-play data!
So here goes my (probably dumb) questions. First “What team missed more shots this season, including playoffs?”
GSW scored more points while shooting less! Instead of writing searches I decided to pivot on the data so I could just drag and drop to answer my questions, like “Where were the shots taken on the court (overlaid) by team?”
How about shots by player on the court?
Hmm…. what about missed vs made on the court?
Okay, how about EVERYTHING that occurred on the court and where?
The same co-worker that gave me the data was interested in what would happen if GSW changes the length of play. Below is a pivot showing shots missed by length of play. It was interesting to see the longer play’s are in favor of GSW.
Obviously there are tons of questions that can quickly be answered with this data. I decided to create a dashboard for you all to look at and see if you can guess who will win the next few games. Okay, this probably isn’t enough to make an accurate prediction, if you want to make a better prediction download the data yourself and try out things like Splunk’s predict, cofilter (what players play well or poorly together) or many other commands.
The first thing I noticed with these reports was that GSW makes the majority of their points within the first 5 feet and CLE is most dangerous from the 25-foot mark. CLE misses more shots between 15 and 20 feet.
While writing this post it reminded me of the three things necessary to get value from data:
- You have get and store the data
- You need something to analyze the data
- You need the right questions to ask
Splunk makes the first two easy, the third is often the hardest. Although machines may one day take over the universe, I feel pretty confident we humans will have jobs for a long time because of number 3.