TIME: 13:00-14:00, 27th June 2017
VENUE: Committee Room 10, House of Commons, Westminster, London, SW1A 2JR
SPEAKER: Professor Kenneth Benoit
MP Chair: Kevin Hollinrake
HJS Chair: Timothy Stafford
Kevin Hollinrake: Good afternoon ladies and gentlemen, I am very sorry I’m late. I’m Kevin Hollinrake, Member of Parliament for Thirsk and Malton, which is in North Yorkshire. I’m breathless because I’ve been in business questions trying to get across to you.
Timothy Stafford: (?) Did you get any good questions? (?)
Kevin Hollinrake: I sat there for an hour and a half but that’s what happens isn’t it? [Inaudible] past 11 school children so I’m very sorry I’m late. What are the features of, I think we’re going to talk about the Brexit campaign. I think one of the features of it, which we’ve heard of in all big election campaigns you have noticed since the referendum, which also social media played a huge important role. I don’t know if you will discuss that as well in your remarks but I had my children, one of which is at school doing Cinderella, coming back on an evening telling me all the stuff we’re going to do to the NHS, like sell it off and prioritise the education service and what we were going to do in terms of these policies that I had never heard of that my daughter was learning from on social media, so clearly I think, from our side, I think there is a wide recognition, I believe that we need to up our game in terms of social media and how it impacts upon politics generally and probably particularly in a world where people are not getting as much local news as they used to do, which is why we used to disseminate lots of our information with the reducing impact of local newspapers, local journalism. So therefore people are finding their information out elsewhere and so I will be fascinated to hear the conclusions from your studies and have that made form part of our future debate for parliamentarians, both for the last election referendum but perhaps future similar things we’ve got ahead in terms of the election stuff we’ve got on our agenda. So without further ado, I’ll hand over to you and we’re keen to hear your thoughts
Professor Kenneth Benoit: Thank you very much. I don’t know if it would be better for me to stand which is part of my normal inclination or speak into the microphone can you all hear me, everywhere? Ok I’ll stand then. Right, so thank you very much for that welcome and thanks for the invitation, it’s an honour to be here in the very impressive buildings you have. I’m going to be talking about, so just to correct one thing, I’m not really talking about a campaign except when we talk about the campaign for the referendum being in the background but it’s not about the actual campaigning through social media but rather what my team and I, we analysed twitter data from the referendum, during the referendum but up to about six months before the referendum about any mention related to the referendum on EU membership. So anything related to Brexit, we captured. We captured about 35 million tweets since January 2016, this is not 2017. If I go up to the current period, we actually captured something close to about eighty million tweets. The company that was assisting in the capture of our tweets went bankrupt two weeks ago and closed their operation. So we’re now trying to recover a couple of tens of millions of tweets that were from the months before we managed to get it from their servers to our servers. It’s a big project data wise. The present analysis of what I’m going to be showing you is based on about 26 million tweets so it’s just a small number of tweets. We captured these based on hashtags, hashtags are these shortcut hyperlinks that are used in twitter data, and I should have mentioned perhaps that this is all from the social media platform known as Twitter. The usernames proceeded by this @ symbol and search terms, any search term being simply the world ‘Brexit’.
I want to mention my research team because I couldn’t have done this project alone. So first of all, this is research that is funded by Horizon 2020 Grant, it’s a European Research Commission grant. One of the ironies possibly of this grant is that we are getting funded by the European Union to study a phenomenon that could have led to the end of the future of funding from the European Union for similar projects, which I hope is not the case because this is a really good example of the excellent funding schemes that are available through the European Research Commission, and in this case it is about 2 million euros of funding for a multi-country, multi-site study, the principle investigators are in Sienna, Italy. This Ponitda (inaudible) is a tool that I developed, is an open source software tool, that has been downloaded by about 60,000 people so far, and growing for the analysis of tech that is funded by a separate European Research Commission project that I have called an Investigator Grant. This research is also under the umbrella of the Social and Economic Data Science unit at the London School of Economics, of which I am the director, and it’s a unit that we are basically trying to get involved actively in the data science space at the London School of Economics, and we think that what we have to add is really the analysis of social, economic and political data rather than say analysing astro-physics data like the data science people do at Imperial. They are very good in the sciences, we want to be very very good in the analysis of big data for the social sciences, so this project is an example of that sort of analysis. And we have colleagues who are doing things on media generally, on ethics on high volume finance, on economics transactions, on a lot of machine learning, we have a statistics and mathematics department very interested in machine learning, anyway that’s my plug for them. And I have colleagues at the data science institute of imperial and I have worked with them on some of these, they have a wonderful visualization studio where we put up some very large-scale mathematics displays, network displays, so these are just some of the people I wanted to mention.
So social media and political communications. Twitter data is vast, there is somewhere between four hundred and four hundred and 50 million tweets sent per day that is an absolutely enormous quantity of tweets. And about 20% of the UK population has a twitter account, its higher in the United States where almost about 30% of the population are active users of Twitter and one of the interesting things for example is the penetration into politics. When I checked, when I made this slide which was a few weeks ago, 563 members of parliament in the United Kingdom have Twitter accounts, now there was an election since then so this may have changed a little bit. I know that Kenneth here has a twitter account for example and we’re actually going to use machine learning to predict the ‘leave’ versus ‘remain’ based on the tweets of the MPs, I’ll show you that shortly. I mentioned that 87%, close to 90%, of UK MPs have twitter accounts, there are web pages, and this is just one of them for example, that track all this information, I’m sure you’re very aware of this, but you can track MPs, you can look at analytics for the tweets of MPs, it’s a very public and very political activity. I’m not looking exclusively at Member of Parliament’s tweets, I’m looking at anyone, anywhere who tweeted. I tweeted some things to my local MP, Crispin Blunt in Surrey for example, I mentioned Brexit, my tweets are in this database. I have colleagues at the London School of Economics who tweet about Brexit, their tweets are in this database. We’ll actually look at ourselves shortly. These are the hashtags and search terms were part of the main seed of searches, you can see that Brexit is the term at the bottom but there are also campaign hashtags that were very clearly associated with pushing for a ‘remain’ side or a ‘leave’ side such as ‘EUreferedum’ or ‘@nottoEU’ or ‘@UKinEU’, ‘YesforEurope’, ‘StrongerIn’ etcetera. And then there were these hashtags that were very clearly related to Brexit, and usually from the hashtags you can infer what it was talking about.
So we had about 3.6 million users, unique users in the data set that we analysed, this is from the 26 million tweets. The median tweet was only one, which means that more than 50% of all people who were in the database only tweeted a single time. But the average was 7.2, so we have a skewed data set here where the typical person, judged by the average, was tweeting 7 times, but the maximum by some of these things we call bots, retweeted up to about 100 to about 81 thousand times. So the first thing we did is use a bit of machine learning, supervised machine learning, to try to figure out which side a user was on. This was important because what was interesting for us was looking at the language, topics and the arguments that were made in the discourse and the dialogue on social media, we wanted to be able to figure out some broad partition of which side was ‘leave’ and which side was ‘remain’ from the data. We used a venerable, it’s actually a machine learning, it’s actually a fairly venerable technique, very robust technique known as naive base, and this is a technique where you take a known set, took some accounts that were very clearly part of some from the ‘leave’ side, very clearly part of some from the ‘remain’ side, we trained the machine based on the patterns found in those data to accept probabilities based on terms found in those data and their frequencies, and you can then do this, and we did this for 15 thousand users using hand supervision, that’s already quite a lot, and we were able to predict the side of the remaining 3 million users. We have a large amount of leverage over the data from this training set.
So this is the sort of predicted accuracy we have, so if this were a computer science paper I would be showing why our method is better than all the other methods of machine learning, in terms of the performance of the classifier but what you can see here is basically if we take the highest performance was over 90, about 93% correctly predicted when we used all of the features, in other words we used all of the terms that were mentioned, we split up the text into individual terms and we used those probabilities in the training, multinomial means were counting the number of times each term occurred rather than the binary occurrence of each term which would be the Bernoulli classifier and we achieved 93% accuracy among our data set which is extremely high, very high, in fact if you got my best PhD students together and told them to work for a short amount of time to classify this, which would be something like 85 years, and then we compared the accuracy, I would be astounded if they could reach 93% so this was really really good. And in terms of the classification of the tweets themselves, we see that it is a roughly balanced data set which is also encouraging, it wasn’t highly skewed towards one side, it looked like we had about 37% ‘remain’, pro-remain tweets, about 34% pro-leave, there should be a margin of error here, and we had this middle category of ‘neutral’. So what we did is we said, we predicted the probability that the user was pro-’remain’ and we classified all the tweets according to that. If your probability of being pro-remain was .8 or above, in other words 4 out of 5, we called you ‘remain’. If your probability of being ‘remain’ was less than 1 in 5, .2 and below, we called you ‘leave’, and anything in the middle we considered ‘neutral’. So that the ones that we classified as pro-leave and pro-remain we were very confident that they were in fact that. And you can see the pattern of post over time, the pro-leave tweets are higher, except for this point here, which is where the red pro-remain side is slightly higher and on the actual referendum data itself there was a surge of pro-remain tweets. This is not indicative of predicting an outcome, this is simply what we measure in terms of classification. So it’s not meant to be predictive, one of the things that you don’t necessarily get by looking at this chart visually is the incredible exponential explosion of tweets that occurred at the time of the referendum, that’s because the y axis is on what’s known as the logarithmic scale, similar to the Richter scale for earthquakes, each of those points is an order of magnitude larger than the previous unit on the y-axis. So there’s an absolute explosion exponentially, if I didn’t plot in on the logarithm we would need something as large as the painting over there. If we try to predict the size of the MPs by party, you probably can’t see this even if you are sitting close to the front, it gave us very plausible results. So the Scottish National Party almost entirely pro-remain, the single UKIP member at the time obviously pro-leave, the interesting ones, you probably want me to look for, the liberal democrats largely pro-remain, Labour very very largely predicted among the MPs that were tweeting is about over 90% pro-remain, and among the conservative party it was majority pro-remain actually, but the biggest group of pro-leave and pro-neutral among any party we looked at. Now I texted my colleague Akkey (?) and I asked him if he could look up Mr. Hollinrake and let’s see if he’s done that because we should be able to pinpoint individuals. .67, we predicted you .67 of being pro-remain.
Kevin Hollinrake: Is that 67% chance
Professor Kenneth Benoit: Yes, so you would be officially coded as ‘neutral’ but on the more pro-remain side
Kevin Hollinrake: I was a ‘remain’
Professor Kenneth Benoit: If we were to flip a coin and do a binary prediction, to machine learning if its greater than 0.5 you would be, we would call that pro-remain, so we’ve actually correctly predicted you. And if we look at some my colleagues here, actually this is the probability of being ‘remain’, anything that started with LSE we highlighted in red, and we can see that pretty much every LSE accounts, except for some ‘neutral’ ones which are monthly related to informational events, were pro-remain. I also singled out two of my colleagues who are very well known for talking about the European Union, Sarah Binzer Hobolt and Simon Hix in the department of government they were about .9 pro-remain, I was also about .85 pro-remain but there was one account, this LSE account here in blue which was very strongly pro-leave, we couldn’t figure out what that was so it turns out this was a person who is a kind of radical pro-Trump supporter who was saying that you should, John McCain is snake or go away you old goat and mass impeachment for the trader democratics, which is a misspelling. So, this was the sort of thing that was happening contemporaneously with this debate was the US presidential election. Why can we do this? Is this unethical? Well this person might deserve to be picked on but when you tweet you agree to terms of service which says that anything you tweet is public and you agree to that before you use the free platform so I have not violated anything legal or ethical in doing this.
We can do this sort of analysis here where we can show a cloud of terms, in this case it is a cloud of hashtags which plots the size of the word relative to the proportional frequency, to the relative frequency of the terms usage, we partition this by our predicted size of ‘remain’, ‘leave’ and ‘neutral’ in this case, it’s too dense for you to look at but I can leave you with some links later on if you want to explore this, we have this up on some interactive visualizations for example and its makes a lot of sense, the ‘leave’ hashtag, the main ‘leave’ hashtag was #leaveEU for example and this was based one of the interesting things you can do is you can use Bayesian statistical methods to look at the networks of followerships and you can scale that to predict the orientation of both the followed and the followers, a fairly sophisticated statistical technique developed by a colleague of mine, in fact he will be joining us at the LSE in September, and we were able to predict him on them on a ‘remain’ score. And if you go to the next slide, I can’t show you the details of those accounts but what it showed is that there’s a bifurcation of the overall orientation of the followers among twitter where there is a high ‘leave’ score and a high ‘remain’ score where in most cases if this were something, like if you were to ask people what their view is on, something fairly innocuous, I’m not sure what that is in politics today but you would see a normal distribution with a single peak and what you see here is a binomial distribution where there is a camp for ‘leave’ and a camp for ‘remain’ and this middle valley is fairly empty. This is not from the language, this is not from the naïve base classification this is from a completely separate method using followership analysis and why am I showing you a completely separate anaylsis? Because a lot of this really is exploratory at this stage.
I’ll show you two more types of analysis we did. This one is something we call sentiment analysis, the second of the two, the first of the two, sentiment analysis uses and this is an area of my own specialization, we look at the language and the words that are used and we analyse their orientation in terms of an affinity to a particular side; it could be positive or negative. So if we were looking at Amazon customer reviews we would be looking to whether the balance of customer reviews was positive or negative on movie reviews or something. What we were doing here was using a system to try and characterize psychological states of mind that were used and compare the two sides according to their predicted value, according to a set of well-known psychological dictionaries. We used the dictionary that’s developed in psychology that has categories about positive and negative emotion, politics, orientation towards power, quantitative language, language about being tentative versus being confident, sadness, whether you’re happy or sad, and something about future versus past orientation. This is the sort of entries in the dictionary basically you have a category, in this case you can’t see it but the category is reward language, where you’re using the language of reward or where you’re using the language of punishment. And these are words that are triggers for rewards, so if someone thinks that the words bonus, and any morphological version of the word bonus, such as bonuses or bonus or any plural or suffixed version of bonus would be a trigger to count reward and this is developed not by me but by people in psychology. Comparing the results, for example for reward language we can see that there is a significantly higher proportion of reward language usage among the ‘leave’ campaign, I shouldn’t say campaign, among whose people whose accounts were predicted pro-leave, that’s more accurate, that use more language about reward than was used by the people predicted to be ‘remainers’. In terms of positive versus negative emotion, if we look at, in this case, it’s a type of ratio of the positive proportion of language to the negative proportion of language where a higher score mean that you’re more positive relative to being negative in terms of your language use, the ‘leave’ side was more positive than they were negative relative to how the ‘remain’ side was positive versus negative.
The ‘remain’ side the ‘neutral’ side are very similar in their profile but our interpretation of this is that the ‘leave’ side, the people who, the discourse among the ‘leave’ side was about the benefits of leaving, the discourse among the ‘remain’ side was about the scary, negative consequences that would occur as a result of leaving. Not about what the EU has done for us but look at how bad things would be if we leave. Sad language, similar story, people predicted as pro-remain had a higher degree of sad words in what they were saying relative to the ‘leave’ campaign although the ‘neutral’ side seemed to be the saddest of all. Future versus past language, so this is a ratio which tells you that there’s more usage of the past than present in general, because 0 is the dividing line, that’s just typical of language so in language everyone speaks that way more or less, just a feature of how we speak but there is more features of past versus future orientation among the pro-leave tweeters than among the pro-remain tweeters. In terms of tentative language this is almost 25% higher here, the ‘remain’ side had much greater tentative language relative to the ‘leave’ side. The ‘leave’ side was using words that were more certain and less tentative whereas the ‘remain’ side was full of tentative references. And in terms of power, we see a huge difference here as well, this is similar to what we saw with the tentative language, there was more language about power, assertiveness and confidence in the ‘leave’ tweets relative to the ‘remain’ tweets. And then there is a slightly higher reference to quantitative concepts in the ‘remain’ tweets, references to economic consequences is our interpretation. More references to numbers and numerical concepts among the ‘leave’ who emphasized less tangible, less quantitative benefits. So, our conclusions were that the ‘leave’ side was more reward orientated, more positive, more assertive of power, less quantitative, less tentative, and showed less sadness and more future relative to past orientation. And if you look at this over time you can see that what we observed, as a whole as true across the time, this is for example the positive versus negative and we can see that the ‘leave’ side on board was pretty much higher across all time zones and this was piling from January to July and we can see the same thing for tentative language, we can see how much higher the ‘remain’ side, this is the daily plot but this is smooth according to an algorithm that takes this noisy plot and gives you a more central tendency of it. So this regularity is not something we tested to see if these numbers were robust, they were very robust.
This is the last thing I’ll show you because of time, but we analysed this using Topic Models, and Topic Models are a method of unsupervised learning that takes clusters of words and associates them with and number of topics, it’s a type of clustering method that was developed in computer science about 15 years ago and it allows us to basically take a very large quantity of text in this case tens of millions of documents of tweets, we combine them I think by user, by week and then we plot it, we figure out which topics were being discussed using this technique and we compare the proportions of topics according to whether we were talking about the ‘leave’ side or the ‘remain’ side and this is a method that allows us as well to include this covariant or explanatory variable of being on the ‘leave’ side or the ‘remain’ side as part of the model itself. I know you can’t see these over here so I’ll walk you through them but this x-axis here is the effect on the prevalence of a topic among the users, among these documents that we defined as users per week. Higher means that it’s got a larger probability of a larger effect of this topic being observed as a result of being classified as ‘remain’ versus ‘leave’.
Down here, and I’ll read through these topics for you, the number one topic that you are least likely to use in your text as a result of being pro-remain was a topic about Donald Trump. If you look at the cluster of words in here, these are things that were expression of support for Donald Trump and it was odd how the Brexit debate and the US presidential campaign mixed together but there’s a very clear association between the two sides. Another one, these are some of the 0 line, here are the ones that really show the effects of being pro-leave, mentions of topics of words about Islam, there was a set of things related to the standard arguments of ‘leave’ and the debate about ‘leave’, taking control was a topic, here’s one about Jo Cox and xenophobia, you’re more likely to mention this if you were classified as pro-leave, things about immigrations and identity, sovereignty, immigration control. On the pro-remain side the topics that were probably most striking, things about the youth, about market excess, about the future of youth, about the economy that was a very strong topic, Scotland, Northern Ireland was a very strong pro-remain mentioned topic, and labour and the national health service, labour not as a party but labour as people who provide labour in terms of labour unions and that sort of thing, and anger against the ‘leave’ campaign very very highly prevalent topic, just general expressions of anger about how the ‘leave’ campaign was portraying was actually a strong topic in and of itself and thing about mobilizing the vote, that was the topic you were most likely to be more involved in using as a result of being pro-remain versus pro-leave among other topics we considered. And if we want to just browse through some of these we can see here’s a topic about immigration and Islam which is the second most likely to be used if you were pro-leave, the biggest hashtag we observed in that was this MAGA, Make America Great Again. I have colleagues who have studied this in the US presidential campaign, you only use this hashtag if you were a Trump supporter. It’s not a hashtag you use facetiously or sarcastically or in order to refute it, you basically use this only as an expression of pro-Trump support and it shows up as huge in one of the pro-leave topics, which is kind of weird why people would think there was an association between voting in the UK referendum and making American great again but it’s there.
This is a topic about what we labelled as Jo Cox and xenophobia. There is one for example about labour and the National Health Service, we can see that there are references to worker, there’s references to NH from National Health to National Health Service that actually should be NHS but our, this is very common in data science, you use fairly blunt methods to get a handle on tens of millions of data units. One of the things we use is called a stemmer, it would convert things like ‘votes’ to the stem of the word like ‘vote’. It truncates this by removing the ‘s’, which is what we use to make words plural in English. It took the ‘S’ off of NHS, we should have protected that. Scotland and Northern Ireland very clear that the biggish words are Irish, Ireland, Scottish, Scotland and the rest of the words remain about territorial units, and this is the sort of analysis we do in order to give these topics a label because this is chosen by the algorithm, this is chosen by our ex-post interpretation of the results. And then we have these topics if you want to explore further about Brexit and the market, Brexit and the economy etcetera.
I lied to you I told you there would be two more things, I’ll give you two more slides for the last thing and I’ll do it really quickly. This is the thing that we’ve just started doing, looking at the types of messages, what types of messages were used during the campaign through Twitter there are categorizations in the field of political communications where people have basically said that there are different messages you can use for tweets like for example you can request action or you can thank someone, you can make an announcement, you can request information, you can demand a response and we are trying to use a combination of human coding and a similar machine learning method to categorise the different tweets and this is the very last slide I promise. We came up with this frequency distribution by different categories of tweets but it’s not implausible but we have a lot of work to do before we think that we have really nailed this down. The middle one here is directing people to information for example you can see that for the ‘remain’ journalist, this is the journalist where this is, on the left, is the ‘leave’ and the ‘remain’ and the four major columns at the top are campaign accounts, journalist accounts, members of the European Parliament and members of the UK Parliament. The journalist accounts you would expect to be referring to for the most information for example that is true but there were a lot more references to information among the pro-remain journalist accounts than there was among the pro-’leave’ journalist accounts but this is the analysis that I wouldn’t say is ‘quote worthy’ yet because we’re still working on it, this is stuff we only did last week. [Inaudible] Thank you.
Kevin Hollinrake: Ok, fascinating and you’re happy to take some questions?
Professor Kenneth Benoit: Absolutely!
Kevin Hollinrake: Can I ask the first one? My chair’s privilege. I was a member of the ‘remain’ camp, so in your considered opinion, did we get ‘out-social media’ed’ by the ‘leave’ camp?
Professor Kenneth Benoit: That’s a really tricky question because we didn’t analyse it as a campaign but what was very clear is that there were more accounts that were on the pro-leave side that were making more bold claims than the ‘remain’ side and in a way you could say that, our interpretation of that is that it’s easier to make extravagant claims about something that has not happened than about the status quo. How do you exaggerate the occurrence of the status quo? What the ‘remain’ side tried to do is exaggerate the negative consequences of leaving but that had already focused the debate on the positive consequences of something that would be a change versus the negative consequences. So the ‘remain’ side was already basically talking about negative things and the ‘leave’ side was talking about positive things this is maybe just inherent in the nature of the situation but it was a very clear result in our analysis.
Kevin Hollinrake: I have lots of other questions I could ask but I will hand over to the audience to ask questions. This lady over here. Could you just say who you are and where you’re from that might be helpful?
Question 1: My name is Isabel Erris. Thank you so much for the talk, I am at Durham University. I just wondered, the hashtag ‘MAGA’ thing obviously your data set was worldwide tweets I assume. So in the Trump debate, my research is on Trump at the moment and his tweets, and obviously Trump supporters and Trump himself are quite pro-Brexit and pro-leave, so cant one of the reasons that the word clustering was so close with Trump and Brexit be simply because that was the plan of American input for [inaudible] and that’s why MAGA [inaudible] quite seriously because it was related to that.
Professor Kenneth Benoit: That is really the only plausible explanation and we haven’t looked at the geography of those tweets but we could. So Twitter, your device if you have GPS on it and most people who use Twitter are using a phone, including President Trump, you can enable geo-location on a phone but only typically about 3% of people have the GPS coordinates turned on with their Twitter application and we wish that everyone would turn it on because it makes much better analysis. Our provider who captured the tweets for us records the geographic coordinates of the internet service provider that people use which is usually a pretty good, rougher proxy for their location. So we actually have maps, I didn’t show you, where we plotted the location of the pro-remain and the pro-leave tweets for example. Yes, you do see that there are a lot fewer pro-leave tweets in the London area but it’s fairly rough because you will have an urban area which will have a gigantic concentration of the same location of tweets because the internet servers, Virgin Media’s hub is in Bristol, for example, so it looks like their all from exactly the same location. But what is clear is that if you plot the pro-remain, you see a lot of pro-remain throughout the UK and all over Europe, and then you plot the pro-leave and it’s only in the UK. We didn’t plot actually North America, so we realised afterwards exactly what you suggested that we should have plotted, that we should have extended our analysis to not just Europe but also the United States because it probably would have shown that. I was at Durham yesterday actually taking my son to an open day by the way. It’s a lovely place.
Question 2: [Inaudible] John Butcher with the society. In your introduction Mr. Chairman you mentioned how social media tends to circulate completely false statements in one sort or another to whip up fears amongst various groups especially the young I imagine, and this was clearly evident during the recent general election campaign. I wonder professor, in your indices, went as far as examining the extent to which such falsities were being tweeted whatever they were about and the extent to which they were being rebutted by others in an effective way and therefore the extent to which they might have influenced the outcome of the referendum?
Professor Kenneth Benoit: So the short answer is that we have not analysed yet but the longer answer is that it is something that we have done a little bit of informal investigation in. A lot of speculation and it is something that is on our list to examine. So for example, if we take some of the, there were claims about the National Health Service and the size of the rebate that would come back as a result of leaving. We know that were was some of the more contentious figures, the Brexit bus with 350 million on it, we looked for evidence of that particular claim in the data. Very cleverly, people who were behind that claim didn’t overtly make that claim on social media but there were cases when we would see the imagine of say, Boris Johnson standing in front of a bus with 350 million on it but it wasn’t in the text of the claim. So, one of the things about social media and Twitter is that you can but a link in a tweet and it shows up as an image on a feed so it’s very clear, very powerful statement suggesting something, we lose that in our analysis.
We’re only looking at text here unless we had a way to analyse those images that is a nuance that is powerful on humans and completely lost on the machine. That’s one of the limitations of what we analyse but we are looking at some of those claims. That is a really interesting question that would be, so in the academic work that we do, one of my roles I’m actually a journal editor, we do this thing of peer review, where young hopeful people who are trying to get promoted in academia will send unsolicited and unpay their papers and we sent them to anonymous peers who reject them much to the chagrin of the younger scholar. One of the reason that these papers get rejected is having more than one essential idea in an article, so we recognise that for us to focus on this, that our current focus is really on the discourse being used by the two sides. What is the type of argumentation? What are the topics of argumentation and what is the style of argumentation? But that issue about truth is a really, really important one and that is one of our topics to look at.
Question 3: My name is Liz Carter and I’m truthfully (?) a human rights activist. [Inaudible] In the very beginning, when you were talking about numbers and unique tweeters you said that the average was 7.1 and the median was 1.
Professor Kenneth Benoit: The median was 1 and the average was 7.1, I believe.
Question 3: But the top number was 81,000?
Kenneth Benoit: Yes.
Question 3: Now, that’s really interesting and especially in light of the general election just gone.
Professor Kenneth Benoit: Yes.
Question 3: Because there is consideration both from the geopolitical side that various organisations who have a very large influence on social media are actively and probably even more actively in the general election, acting to influence particularly young peoples as the speaker before me said and there’s some really really interesting data. So I’m really interested in whether you can divulge [inaudible] who was that entity that was voting 81,000 times, I can’t see any person sitting there going ‘dun dun dun dun’.
Professor Kenneth Benoit: There are two types of users that we observe a very high rate of tweeting for. Most of them are called bots they’re aggregator, they’re basically computer programmes that search on key terms and then retweet them and the very top tweeting partisan side were like ‘EU’, ‘IN’, or like ‘LeaveNow’ that sort of thing, these were basically not human beings. There was human being, apparent human being, who is located in London tweeted the 81,000 times, he has since moved on to tweet about, something about babies being murdered in Palestine. It is a very unusual individual who has a highly partisan agenda, but it shifts from one issue to another and really does look like this person spends all of his or her time tweeting. And that user account, we have that and we could have shown you some of them but it’s very bizarre. We didn’t want to give the sense that was a typical user, it’s a very atypical user.
Question 3: That’s the powerful users that are getting on this bandwagon and making this potentially not a British thing. It’s being turned into something else, which is very concerning.
Kevin Hollinrake: For you to say, I guess it’s not just the frequency or volume of the tweets it’s the reach that individual in terms of followers increasing the, kind of effect [inaudible]
Question 3: An organisation claimed in the general election that they sent one and half million tweets. [Inaudible] to all the young people they identified on Facebook for example. And they sent, they made 50,000 phone calls on the day of the election, they got 600,000 people registering on the last day to register and they’re very proud of it. And 50,000 phone calls on the last day it’s not little organisation, it is an organisation that has influenced all over the world. So I’m really interested in the work we’re doing here because there’s a very serious implication for external influencing in our local politics.
Professor Kenneth Benoit: That’s true
Question 4: I’m Josh Bond (?) I’m actually doing a masters in political strategy down in Exeter, well I start September so not quite yet. My question is quite short, I just wanted to know so now that you’ve got this research what you are actually going to do next. So are you going to kind of, obviously because it’s now just over a year since the referendum result, are you going to look at how people have changed? I’m a bit more interested in future research.
Professor Kenneth Benoit: This kind of topic analysis is something we can do where we can track, we call it the prevalence of a topic at a given time point and by segmenting the documents into users at different times we can track the evolution of these arguments over time. We’re very interested in change points at a few key events such as the recent general election, or the start of negotiations, or the formation of the May government post-election. So that’s what sort of thing we’re working on now, we ran into some actual physical challenges with the data. The data is already too large to fit on a desktop computer we have this high-performance computer at the LSE and we have to get the data as we extract it, then aggregate it and then do the analysis, and then as I mentioned, in the computing world, it’s not uncommon for start-ups to provide a lot of this. And start-ups come and go and ours went out of business two weeks ago. We’re looking for alternatives now.
Question 5: I’m Kelly and I’m from Australia. I was just wondered what the context was of the word ‘Islam that was connected to the ‘remain’, sorry the ‘leave’ users?
Professor Kenneth Benoit: That’s a really interesting one because it appears that the, and this is probably something you have read other reports of, there was an anti-immigration sentiment, or this idea of taking control of the borders was extended in the discussion about the taking control of the borders to more than just taking control of the people who were legally residing as a result of being citizens of an EU country. That it was directed at immigrants generally and a lot of these immigrants were identified with Islam, that’s our interpretation.
Question 6: I think in one of the graphs you showed, the positive and negative sentiments of the different campaigns. Is there any one point where the ‘leave’ fell below the ‘remain’ and was that about migration and maybe Germany was at that time? Or Turks or Turkey or whatever, what was all that about?
Professor Kenneth Benoit: It was actually, that was actually during the referendum itself, there seemed to be more of a surge side among the ‘remain’ side than the ‘leave’ side. And a lot of that, not all of it, came during the day that people were voting but a lot of it came from the initial expectations up unto midnight. Nigel Farage basically though that his side had lost. People thought that the ‘remain’ side had won. I went to bed that night thinking that the ‘remain’ side had won. That was when there was a lot of victorious tweets on social media about having won when they hadn’t. Let’s just say that wasn’t the case the next morning.
Question 7: I’m [inaudible] from the University of Sheffield. After research into the Luxembourgish language policy, the language [inaudible] there at the moment and quite similar to this in the sense of, so the comments might be more pro one side but retweeting and liking and sharing of things show a completely different story. Have you looked into?
Professor Kenneth Benoit: So yes, we separated our tweets, there’s 3 types of tweets you can do, what the lady over here was talking about targeting of people. If you know someone’s user account you can send them a specific message. So if you had a list of name that you though were somehow on the fence, you could target them with specific messages. That’s actually quite similar to what people have been doing in political campaigns since the advent of mail or ringing doorbells. You can do this in a great volume know using robots basically and Twitter because of this. The second type are retweets where you know someone sends a tweet and you decide to broadcast that to your people, to your followers and the other one is that you create original content. Most of it was original content that we looked at, we did look at retweets, we excluded the, no we included everything, we actually included them all. We didn’t see any systematic differences between the retweets and the original content because I think that, in this when you retweeted something, you were basically endorsing it. I hope that answered that question.
Kevin Hollinrake: Any more questions? And a final one, if I could. So you said that you kind of, it’s probably easier through social media to campaign to, away from the status quo. Does that mean that the status quo is less certain, is less likely to be maintained in the future in any kind of election campaign? I’m not saying it can’t be maintained but is that going to lead to more instability I guess?
Professor Kenneth Benoit: In the theories of political science, my background is in political science, my PhD is in political science, we have all these theories about the stability of the status quo and why the status quo can defeat, paralyse alternatives unless they fall into a rare set of convergence that builds a coalition to defeat it. It does seem to be somewhat the opposite recently and I think the reason is, it has a lot to do with this representation of public opinion of this, what we could cynically say, manipulation of public opinion. When you’re angry at the status quo, then a change from the status quo can be represented more positively relative to the negative perception of the generic status quo even when that’s not part of the alternative. And this appears to be what happened with a lot of the immigration sentiment and the promise of a brighter economic future. If you’re out of a job, or if you’ve had a negative experience from what you perceive to be caused immigration, then someone telling you, you can take back control of the boarders and you reading this through social media where its amplified, is a reason to want to reject the status quo even when you are not clearly voting for a change of that status quo. So, it’s easier to manipulate misunderstanding, I think against the status quo in the modern age. I think that also explains a lot of the, the people who are frustrated with the status quo voted for Trump when he didn’t have a well-defined alternative to the status quo but you had someone promising to get rid of what was broken with the status quo on the basis of claiming competence for that because that wasn’t part of the establishment. That’s essentially what Trump’s policy platform consisted of was basically changing the status quo, getting rid of Obamacare is still one he is trying to pass this week. It’s not really clear that the alternative is better but the main accomplishment of that is getting rid of something that you’re trying to convince people is deeply flawed. Just that every time the congressional budget office shows what the consequences of this particular plan will be relative to Obamacare, turns out it’s maybe more flawed than Obamacare because it’s such a complicated issue.
Question 8: [Inaudible] is more dangerous because it is open to greater manipulation and probably greater inflation (?) is that fascism could strike more quickly?
Professor Kenneth Benoit: Well, it’s one of the great ironies of the information age. It should be the most democratising, anti-fascist influence we should have because everyone has access to the opinion channels, everyone has the right to present alternative versions. Fascism, you would think, requires the control of centralised channels where, one party, one state can control propaganda. This was a fascist regime’s control of propaganda is a central component; that’s impossible in the information age. Yet, what we say is not so much fascism, but popularism. The appeal of populism and tapping into pockets of discontent where you can have, what we call in political communication, is an echo chamber effect. It basically mean when you end up partitioning whose information you pay attention to based on what you want to hear and the echo chamber means that you’re getting more and more of, in effect, what you are predisposed to believe and what you want to believe. And there’s been a lot of analysis of this, of recent elections as you can imagine confirming that this takes place. And unfortunately a lot of the algorithms from social media companies reinforce this and if you’ve ever logged into Facebook, or Twitter, or LinkedIn, or any of the other social media accounts, the companies will suggest people that you would probably like to follow. They do this in the same way that Netflix tells you which movies you would like to see. So if you like action films, Netflix is going to suggest action films, but pretty soon your feed will be full of action films. That’s what happens in social media, so you end up with an echo chamber where the tendencies you would have are exaggerated, and I think that makes, maybe not fascism, but populism definitely more of a danger
Kevin Hollinrake: Thank you. One of the frustrations I had with the referendum, with the EU campaign was that when you talked to people just before, or just after the referendum you would say, well I didn’t know how to vote because I just couldn’t find any information on the EU, and yet there was a wealth of information but people are relying on these little snippets of as you call it an echo chamber. But it’s interesting the status quo point; I’ve been an MP for two years and there’s not been a lot of status quo in that two years. There’s been this incredible, tumultuous, turbulent time, and it sounds from what you’re saying that we might be in for more than that [inaudible] me a little bit but at least I can understand the reasons why I’m depressed I suppose. But no, thank you Professor Benoit fantastic and very fascinating insight into social media. It is something that we’re going to know an awful lot more about, particularly in the world of politics I think. Thank you very much.