As an Englishman I’ve been waiting months – with very high expectations – for the World Cup to come around. Reading fellow Splunker, Matt Davies’ blog post titled, “Splunking World Cup 2014. The winner will be…“, only heightened my excitement.
The tournament is now going into the second week and I’ve been starting to look at the teams, players, and tournament more closely. Which stadium holds the most people? Who’s the top scorer? Which referee hands out the most cards?
With these questions fresh in my mind I opened up Splunk and began to have a look at the huge amounts of information being streamed from the tournament. For this post I’m going to explore real-time match updates; including teams, scores, and match locations.
Step 1: Choose the Data Sources You Want to Splunk
There are lots of potential sources to grab World Cup data – from match reports to fan Twitter feeds. Software For Good have created a bunch of endpoints offering both match and team information.
For this project we’ll use their live match endpoint.
Step 2: Install the REST API Modular Input in Splunk
Step 3: Configure your RESTful Input
In Splunk navigate to “Settings > Data Inputs > REST”, and select “Add new”.
REST API Input Name: WorldCupMatchData (optional)
Endpoint URL: http://worldcup.sfg.io/matches
Response Type: JSON
Set Sourcetype: Manual
Host: SFG (optional)
You will see we set the “Response Type” to JSON as the feed being returned is in JSON format. It is also important to explicitly set the “Sourcetype” to “_json” too. This ensures Splunk parses the JSON events correctly at search. If your search returns grouped events, you’ve probably forgot to set this.
Note, I have only included the fields that are essential to configure (unless stated). Everything else can be left blank or as default (unless you need to enter in a proxy to get out to the internet, etc).
Step 4: Lets Play
Note, this data source also contains future match data. If you’re not interested in this information just specify “NOT status=”future” in your search string.
Where have most matches been played so far? (Maracanã – Estádio Jornalista Mário Filho – 7 / Estadio Nacional – 7)
host="SFGFeed" NOT status="future" | top location
How many goals have been scored? (49)
host="SFGFeed" NOT status="future" | stats sum(away_team.goals) AS TotalAwayGoals sum(home_team.goals) AS TotalHomeGoals | eval TotalGoals = TotalAwayGoals + TotalHomeGoals | fields TotalGoals
Average goals per game? (~3)
host="SFGFeed" NOT status="future" | stats sum(away_team.goals) AS TotalAwayGoals sum(home_team.goals) AS TotalHomeGoals dc(match_number) AS TotalMatches | eval TotalGoals = TotalAwayGoals + TotalHomeGoals | eval GoalsPerGame = TotalGoals / TotalMatches
What stadium were the most goals been scored in during the first matches? (Arena Fonte Nova)
host="SFGFeed" NOT status="future" match_number>=1 match_number<=16 | stats sum(away_team.goals) AS TotalAwayGoals sum(home_team.goals) AS TotalHomeGoals dc(match_number) AS TotalMatches by location | eval TotalGoals = TotalAwayGoals + TotalHomeGoals | sort - TotalGoals
Which teams won their opening games? (USA, Switzerland, Netherlands, Mexico,, Ivory Coast, Italy, Germany, France, Costa Rica, Colombia)
host="SFGFeed" NOT status="future" | where winner!="Draw" | top winner | fields - percent
(Note, the numbers will be out of date by the time you read this! Maybe England have won!)
Step 5: Extra Time
I’ve only started to scratch the surface here. Remember this data source is streaming information in real-time into Splunk as matches are being played. Why not get Splunk up on a second big screen whilst your watching the game to analyse the stats (too much)?
Correlating the data from the Software for Good endpoints with other sources may also prove interesting. Does the number of goals scored during the game have any correlation to the heat? Or distance travelled by teams before the match; how does this impact the final score?
Now I do believe there’s a
soccer football match on…