This post is the first in a series analysing social data about the UK General Election 2015.
On Thursday 7th May 2015 the UK will hold a General Election to vote for the next Prime Minister.
In the run up to the vote there is going to be a series of pledges, appearances, and debates. Over the coming weeks I am going to be collecting data from various sources of media into Splunk to provide some insight into how each of the main party is faring .
On Thursday 26th March the campaign kicked off proper with the first leaders “debate” (more like interviews) with the Prime Minister, David Cameron, pitted against the Leader of the Opposition, Ed Miliband.
For this exercise I collected tweets containing the official debate hashtag: #battlefornumber10. To do this I used Twitter’s Streaming API. It is important to note that the streaming API, unlike the private Firehouse API, samples the stream of tweets from between 1% to over 40% of all tweets being posted.
You can see details of how to configure this input into Splunk by reading one of my earlier blog post here.
For the analysis we examined the debate on 16/03/15 between 21h – 2230h GMT+0.
I downloaded the data from my Splunk index into a CSV file using the Splunk API. You can grab the dataset here.
We collected almost 220,000 tweets from around 63,000 unique users – so about 4 tweets per user in the 6 hour period.
There are a few key parts to the analysis, but it is all fairly trivial. One of the main aspects of this analysis involved sentiment analysis. Luckily there is already an existing app created by David Carasso with a sentiment search command. It also comes with a prebuilt sentiment model for Twitter. You can download the Sentiment Analysis app here.
You can then run sentiment searches using the sentiment command. Here’s and example I used to gauge the sentiment of Tweets containing the word “Paxman” (the intereviewer):
#battlefornumber10 paxman | sentiment twitter text | timechart avg(sentiment) by index
The Big Questions…
- The NHS – The biggest discussion on Twitter was around the NHS. At peak we collected almost 4000 tweets between 2140h – 2150h. The NHS also had the highest level of negative sentiment from users – for both leaders (about 28% for Cameron / 25% Miliband).
- Zero Hours Contracts – David Cameron generated the most buzz on the subject – 2053 tweets for Cameron compared to just 245 for Miliband. Almost 17% of tweets were negative for Cameron vs. 15% for Miliband.
- The EU – Cameron faired much worse here. Tweet volume was fairly even however over 28% of tweets on the subject for Cameron were negative compared to just 11% for Miliband!
- Jeremy Paxman – The conversation on Twitter was very focused on the presenter too. 30000 tweets (just under 10% of total volume) mentioned his name. We saw big spikes at the start of David Cameron’s interview, and at the end of Ed Miliband’s interview – perhaps reflecting the tough lines of questioning pursued during this time.
I’m going to stay impartial but here are the overall numbers:
- David Cameron – 52,866 tweets – 79% positive
- Ed Miliband – 36,550 86% positive