Issue 3, 2014 |
Download pdf |
Analyzing Social Media Sentiment @ World Cup
Getting Apama and Terracotta Universal Messaging in the Game
by Dr. Gareth Smith, Vice President, Software AG
Dr. Kevin Palfreyman, Senior Director, Software AG
World Cup had fans communicating in a big way over social media. See how we used Software AG’s Apama and Terracotta Universal messaging to derive sentiment from social media channels. With this real-time data, we could infer the emotion of football fans globally and then present the results to any device anywhere in the world.
Social media and the World Cup
The 2014 FIFA World Cup promised to have the richest social media content of any World Cup. Half-time discussions on TV channels augmented their normal on-screen graphics with selected Tweets from professional players—who presumably commented on the game from the comfort of their own Jacuzzi® hot tubs.
Selecting a few Tweets from a small number of key individuals is a simple task. But what about following the opinions of the other 270 million active Twitter® users? The final World Cup match drew a peak of more than 600,000 Tweets per minute, each containing 140 characters of wisdom from armchair coaches and managers worldwide.
We wanted to do something interesting with the social media feeds we had access to in real-time. Deriving intelligence from this flood of updates is a “big data in motion” problem. Fortunately Software AG has the right tools for the job. Apama is well suited to analyzing large volumes of data streams in real-time. In fact, it’s very much at home running mission-critical applications in extreme circumstances, such as financial trading systems where intelligence is derived from millions of events per second and responses acted on in microseconds.
Apama has access to a social media framework that includes adapters to a range of social media sources (i.e., Twitter, Facebook and TripAdvisor®). It also has specific analytical capabilities, such as sentiment analysis and geo-incident analysis (where we can identify similar messages from different users sent nearly at the same time and very close to each other, useful for tracking things like disease outbreaks).
Building and deploying the system
With only a few days to get the full system running, we provisioned a small host (just two virtual cores would be enough) on the Software AG Cloud. Then we installed Apama and configured the social media framework to suck up Tweets that concerned the World Cup. Once we had this stream of Tweets, we fed them into the sentiment analysis component. This component augmented each Tweet (as it passes through Apama) with a sentiment score in real-time.
Sentiment analysis was performed in two broad stages: keyword scoring and grammar modifiers.
Keyword Scoring
For keyword scoring, you need a lexicon of the key words or strings that you think will carry some sentiment. This lexicon is augmented with a score for each word or string. For example, “excellent” and “great” will score high positive sentiment values whereas “awful” and “terrible” will score very high negative sentiment values.
Grammar Modifiers
To just take keywords and add up the scores is too crude in most cases. So, the second stage takes language grammar into consideration and refines the scores. As a simple example, we looked for negativity to turn the “good” in the phrase “not good” from a positive sentiment into a negative sentiment.
These two stages combined deliver a scored Tweet that we processed further. (I’m explaining what we did using Tweets in English. Obviously you’d need a lexicon and set of grammar rules for each language you want to score. You can use as many as you like to cover all the languages you need.)
Once we had a stream of Tweets with sentiment scores applied to them, we simply passed them into a simple streaming analytic. We partitioned the scored Tweets twice in parallel:
-
Each team. This was done to calculate sentiment values for each team independently. As each team name can be represented in many different ways (for example: USA, US, United States, etc.) we had to cater for multiple representations and amalgamate the results.
- Each of “positive,” “negative” and “neutral” values. This part of the analytic gave us the percentage of Tweets for each team that were positive, negative and neutral, though we only plotted the positive and negative values, making the neutral values implicit in the UI.
These analytics told us the real-time sentiment of each team based on the contents of the social media streams. The output was in the form:
Team Name; Positive%; Negative%; Neutral%
After addressing that all the integration and analytics, we needed to make the results publicly accessible. For this, we turned to another product in the Software AG portfolio: Terracotta Universal Messaging. Like Apama, it’s also commonly found supporting mission-critical applications in Tier-1 financial institutions.
In this case, we used Universal Messaging to take the feed from Apama and make it available to the Web browser, mobile device or tablet of any connected user. We also used it to serve up the HTML5 pages themselves. This is a very powerful capability and supports all sorts of devices, including smart TVs. In our case, the user interface was a simple, single window. A full-fledged Web-based interface would be much better served with Software AG’s Presto (which uses Universal Messaging for data distribution).
The simple Web page had a tile for each team that shows the country name and flag with two bar charts that graphically and textually displayed the live positive and negative percentages.
When we received a change in any of the sentiment values from Apama, Universal Messaging immediately pushed that small update out to each connected device and, in turn, their visual display updates—this all happened on an event-by-event basis and in real-time. To signify to the users that a value had changed, the positive or negative bar was highlighted for a few seconds.
Here we could see social media’s view of all teams on a single page:
Combined, the entire system is summarized in the diagram below: a simple architecture of “off-the-shelf” components quickly deployed for live use.
Experiences with the live system
Once deployed, the service remained fully responsive and functioned flawlessly during the entire period. We supported thousands of connections and held everything in only 2 Gb of RAM. At all times we were left with plenty of headroom with the two 2.66 Ghz virtual cores.
The results were as you might expect. When Portugal’s Cristiano Ronaldo left training early one day before the team’s June 22 match against the USA team, sentiment plummeted because fans worried about his old knee injury. When the team’s officials said he was fit to play, sentiment around Portugal’s team rebalanced to positive.
We also recorded the sentiment of Argentina and Germany during the World Cup final. As the game ends, shown in the chart below, we saw a surge in positive German sentiment and drop in Argentine positive sentiment. This should not be surprising given the result of the game!
Despite the interesting insights into the global emotion of football teams, especially in real-time during the matches, the most interesting aspect of the project was the ease and speed of deploying out a publically available, robust solution from scratch over just a few days. Its legacy is a ready-to-go framework for other global events—next stop, politics maybe?