Incredible tweets: Automated credibility analysis in Twitter feeds using an alternating decision tree algorithm

(1)

IN

DEGREE PROJECT COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2016,

Incredible Tweets

Automated credibility analysis in Twitter feeds using an alternating decision tree algorithm SARA FEYCHTING

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

(3)

Incredible tweets

Automated credibility analysis in Twitter feeds using an alternating decision tree algorithm

Sara Feychting

Course: Degree Project in Computer Science Course number: DD143X

Supervisor: Kevin Smith Examinator: Örjan Ekeberg

Institution: Computer Science and Communication University: Royal Institute of Tehnology

10 May 2016

(4)

(5)

Abstract

This project investigates how to determine the credibility of a tweet without using human perception. Information about the user and the tweet is studied in search for correlations between their properties and the credibility of the tweet. An alternating decision tree is created to automatically determine the credibility of tweets.

Some features are found to correlate to the credibility of the tweets, amongst which the number of previous tweets by a user and the use of uppercase characters are the most prominent.

(6)

(7)

Referat

I det här projektet undersöks trovärdigheten av enskilda tweets utan att använda mänsklig uppfattning av dem. Information angående använ- daren och meddelandet undersöks för att finna samband mellan dessa egenskaper och trovärdigheten hos tweetet. Ett alternerande beslutsträd skapas för att automatiskt avgöra trovärdigheten av tweets.

Det visade sig finnas vissa samband mellan egenskaper och trovärdighet, framförallt hur många tweets användaren har skapat tidigare samt an- vändningen av stora bokstäver i texten.

(8)

(9)

1 Introduction

In today’s society it is becoming increasingly common to use social media as a source for news stories and information. During larger catastrophes with a lot of confusion people have started looking to social media for reports from persons on the scene. It is also a way for less known newspapers to get their articles spread. Some people use social media to share random interesting facts.

One common factor in all these cases is that there is no guarantee that the information shared has been confirmed true.

Using information spread through social media is an efficient way to get knowledge of what is going on in one’s surroundings, but some people use the unregulated services to spread misinformation or propaganda. For a person not used to sorting through information from social media it can be very difficult to tell what is real and what is not, therefore it is of interest to provide a tool for checking credibility of postings on social media.

Problem statement

This project will focus on the social media outlet Twitter. The intention of this project is to investigate factors that can be used to determine the credibility of a tweet. Many studies have tried to determine the credibility of tweets by focusing on tweets regarding specific events or with special features. However, the users of Twitter come into contact with tweets from different sources with different subjects, and therefore need a tool that can determine the credibility regardless of these features. To provide this service is the ambition of this project.

Machine learning software will be used to analyse a large set of tweets and create an algorithm for determining the credibility of tweets. The algorithm will only require the tweet ID of a single tweet to determine its credibility and will not require human perception of the credibility of the tweet or the user. The result will then be tested and compared to the findings of projects that have studied Twitter credibility under different circumstances.

1

(12)

2 Background

This section will cover the basics of what Twitter is and how it is used. It will also account for previous research regarding credibility within social media.

The machine learning software used to analyse large sets of tweets will also be introduced.

2.1 Twitter

Twitter is a microblog that was created in 2006 and was introduced as a compact version of Facebook. What made Twitter unique was the limit of only 140 characters per post (or “tweet”), which is the regular length of a cell phone text message. Since 2011 it has been possible to add URL links to the tweets without infringing on the 140 characters. A tweet can contain one or more hashtags to indicate topics of the tweet. It is also possible to retweet someone else’s original tweet.

Accounts on Twitter require the user to state a name, email, and password, all other information is optional. The email address has to be confirmed and the name has to have at least one character. Users are provided with a news feed which consists of tweets from the accounts that they follow. Twitter does not allow a user to follow more than 5000 accounts, unless it has more than 5000 followers; then the user is allowed to follow the same amount of accounts as it has followers.

2.2 Related work

Determining the perceived credibility of a tweet can be done by studying several different factors. Some factors that are commonly investigated are user credibility, tweet content, retweet behaviour, and attachments[1, 2].

While there have been many previous studies within this field, all have had restrictions regarding what kind of tweets they are applicable to. This project has no restriction as to the kind of tweets it can be applied to, and will therefore resemble a fusion of previous research.

User credibility

The credibility of a user can be determined by studying the number of accounts the user is following and the number of followers it has, the number of posted tweets, the age of the account, and the number of favorited tweets[2, 3, 4].

Following many accounts but having few followers itself indicates an unreliable user according to [3]. Another aspect indicating unreliability is following unreasonably many accounts due to the implication that automated agents have been used. [3] and [4] stated that a limit can be drawn at following approximately 5000 account. Small numbers of both followers and accounts followed indicates a new account or an irregular user, which results in uncertain credibility, while having many followers and not following many accounts indicates

2

(13)

a celebrity account. Celebrity accounts do not necessarily indicate credibility, but neither do they indicate unreliability[3].

A high number of previously posted tweets indicates high user credibility when posting news related tweets[2] and also indicates credibility when tweeting about other subjects[3].

Tweet credibility

Regarding the content of a tweet there are several aspects that provide mea- surements to rate its credibility[2]. URL references can indicate unreliability if absent in tweets regarding news[2] and credibility if present in tweets regardless of topic[1, 3]. Neither [1] nor [3] mention any implication for the credibility of non-news related tweets when URL references are absent. Hashtags and mentions of other users also indicate credible tweets[1]. [2] found the presence of negative sentiment phrases indicative of credibility while the presence of positive sentiment words was indicative of lacking credibility. [1] stated that high numbers of sad emoticons indicated a credible tweet.

Retweet quantity is another aspect that has implications for the credibility of a tweet[2, 3]. A high retweet count indicates a credible tweet, especially when retweeted by a reliable user[2]. The depth of retweet tree also indicate credibility if great, while the number of initial tweets on the subject indicate higher credibility when they are smaller[2].

2.3 Machine learning software

To analyse a large set of tweets and find factors correlating to the credibility of the tweets, the software WEKA(Waikato Environment for Knowledge Analysis) will be used. WEKA is a computer program that has been developed by machine learning scientists and industrial scientists at the University of Waikato, New Zealand, with the purpose of making machine learning techniques more available for specialists within other fields[5].

The program can analyse databases that are too big to be analysed manually, without the users needing to be machine learning experts themselves. Many common machine learning algorithms are available to chose from when analysing data in WEKA, and graphs and trees can be generated to illustrate the results[5].

3

(14)

4

(15)

3 Method

This section will present the algorithm used and give a short motivation for the decisions that have been made. The dataset will be described together with the collection process of additional data. Code examples will be provided to illustrate the implementation of the algorithm.

3.1 Machine learning algorithm

When analysing data in WEKA there are different algorithms available. To ensure the possibility of implementing the result in Ruby, a decision tree was deemed the most appropriate type of algorithm. The Alternating Decision Tree(ADTree) algorithm did not yield the most accurate result of the different algorithms tested, but was amongst the best of the decision tree algorithms.

It also makes the combining of user and tweet evaluating algorithms possible.

The ADTree algorithm is a generalisation of decision trees, voted decision trees and voted decision stumps, ment to yield smaller rules and therefore be easier to interpret. [6] found that the ADTree algorithm can compete with boosted decision tree algorithms regarding accuracy.

ADTrees are composed of decision nodes with two prediction nodes as their children. Unlike decision stumps real numbers are associated with the leaves, instead of +1 or -1, which results in a measure of confidence added to the result.

The decision nodes are written in the following form[6]:

If (precondition) then

If (condition) then output p1 Else output p2

Else output 0

3.2 Dataset

Finding a dataset to use for testing and evaluating the algorithm was difficult.

Twitter does not allow datasets to be available for downloading from the internet which contain identifiable tweets or users, and are graded as credible or not.

Some researchers agree to share their datasets with others investigating similar topics, but they can be hard to reach. Creating a new dataset for this project was not possible due to the need for the tweets to be annotated as credible or not.The dataset ultimately used for this project is from [7] and consists of 3781 tweets with pictures attached, where the picture is graded as real or fake. This provides the possibility of assuming a tweet is lacking credibility if the picture is fake and that it is credible when the picture is real, thereby establish ground truths. 2564 of the tweets in the dataset contained fake pictures and 1217 tweets contained real pictures.

5

(16)

The dataset was collected by searching for tweets using keywords designed to match specific events of interest[7]. Pictures previously confirmed real or fake were gathered and only tweets containing one or more of those pictures were included in the dataset. Due to the use of an “optimized visual near-duplicate search strategy”[7], the tweets pictures did not need to match the exact URLs of the pictures collected, instead the images themselves could be matched.

The tweets were also annotated with the following content features. Prede- fined lists supporting English, German and Spanish were used to compute the sentiment words[7].

• Length of tweet

• Number of words

• Contains question mark

• Contains exclamation mark

• Number of question marks

• Number of exclamation marks

• Contains happy emoticon

• Contains sad emoticon

• Contains 1st order pronoun

• Contains 2nd order pronoun

• Contains 3rd order pronoun

• Number of uppercase characters

• Number of negative sentiment words

• Number of positive sentiment words

• Number of mentions

• Number of hashtags

• Number of URLs

• Number of retweets

The dataset also contained user features, but due to some important features not being included, user information was instead collected directly from Twitter using their REST API.

6

(17)

3.3 Twitter REST API

To simplify connecting to the Twitter REST API the application is written in Ruby and uses the gem “omniauth-twitter” to connect to the Twitter API, and the gem “twitter” to process the data received from the API. Tweets can be found using ID and users can be found using either ID or username. An application registered with Twitter is allowed to request user information 180 times per 15 minutes. Therefore the user database could not be populated instantly and instead requests were run with a five second delay per request, to ensure the limit was not reached. This was achieved by creating a stand alone script that queued a list of all tweet IDs and called a twitter connection feature build into the main application. When the application received a tweet ID it connected to the Twitter REST API and requested user information for the author of that tweet. Due to the small size of the database this was not an issue.

• Number of followers

• Number of following

• Number of favourited tweets

• Time and date created

• Number of published tweets

• If the user has a profile picture

To enable the use of the account creation date in WEKA the time and date was changed to only show the year the account was created. To gain credibility annotations for the user database, the users that published fake pictures in the dataset were annotated as fake and the users that published real pictures as real.

3.4 Implementation of the algorithm

To create an algorithm for determining the credibility of users and tweets, the data collected from the dataset and from Twitter was run through WEKA.

In order to find independent credibility features for users and tweets, the two databases were processed separately. The algorithms created by WEKA were then combined when implemented in the Ruby application.

The alternating decision tree is implemented as a series of if-statements, along which two separate credibility scores are being determined for the user and the tweet. The total score is calculated as the sum of the two scores. If the score is positive it is indicative of a reliable tweet, and if the score is negative the tweet is indicated as unreliable.

After implementing the algorithms in the Ruby application edge-cases need to be handled. Due to the age of the tweets in the dataset there are users that have been removed from Twitter. In those cases the user is given a - 0.7 credibility rating, which is slightly higher than the worst credibility rating

7

(18)

that can be achieved in the algorithm. This is motivated by the fact that when investigating some random cases with missing users, the users had been removed by the Twitter organisation, thus indicating low credibility.

For the cases where the total score was close to zero(thereby indicating neither credibility nor the lack there of) a buffer is created with the range [- 0.15, +0.25] where the tweets are marked as uncertain. This helps decrease the number of false positives and false negatives. When the buffer is decreased to the range [-0.1, +0.25] the new tweets that are categorized have an error margin of 53%, as illustrated in the figure below, therefore the former range is considered more appropriate. The error margin when decreasing the upper limit of the range to 0.2 is 48% and will therefore also remain at the previous value.

Figure 1: The results when changing the buffer range. Calculated with the range [-0.15, +0.25] as base value.

8

(19)

4 Results

This section will first present the specifics of the user and tweet credibility algorithms calculated in WEKA. Graphic representations of the decision trees can be found in appendix 1 and 2. Calculations of the accuracy of the algorithm will also be shown.

User credibility

When classifying the credibility of the user, the ADTree starts with the negative score -0.373. The smallest change in score to appear in the tree is 0.07 and the largest is 1.051, as can be seen in appendix 1. The number of tweets written by the user appears in several places in the decision tree, always with a positive score for higher amounts of tweets. The number of tweets the user has as favourites also appears in several places, but with lower amounts indicating higher reliability. The year the account was created appears twice in the tree, although once with a positive score for older accounts and once with a positive score for newer accounts. When reaching a node based on the ratio between followers and accounts followed a lower ratio indicates higher credibility, altough only bringing the user up to a score slightly below 0. See appendix 1 for a graphic representation of the ADTree.

Tweet credibility

The ADTree for classifying tweet credibility also starts with a negative score of -0.372. The smallest change in score to appear in the tree is 0.089 and the largest is 1.981, as can be seen in appendix 2. The number of uppercase characters in the tweet has a lot of impact on the credibility rating further down in the tree.

For a higher credibility score the number should be between three and eleven.

Higher credibility is also implied if the number of URL links included in the tweet is more than one. Since the dataset only contains tweets with at least one image, the lowest number of URL links that can be found for tweets in the dataset is one. The length of the text in the tweet indicated higher reliability when longer. See appendix 2 for a graphic representation of the ADTree.

Accuracy of the results

When testing the combination of the credibility algorithm for users and tweets on the dataset, the result revealed that 85.7% of the tweets were correctly classified, 8.8% incorrectly classified and 5.5% withing the zone for uncertainty. More detailed accuracy calculations can be studied in the figure below.

9

(20)

Figure 2: Accuracy of the final version of the algorithm.

5 Discussion

In this section the results of the final algorithm will be discussed and compared to the results of previous studies. The lacking size and variability of the datasets implication to the quality of the algorithm will be considered. Alter- native methods that could have been implemented will be compaired to those used and recommendations for future research discussed.

5.1 Dataset

The dataset contained approximately two times the number of fake tweets as real ones, which accounts for the negative starting score when grading the tweets.

Due to issues finding a comprehensive and varied dataset of tweets with credibility annotations, I had to use a dataset only containing tweets with pictures.

Having URL links to images or other sources is usually an implication towards higher credibility, as mentioned by other studies in the related works section of this paper. The fact that all tweets in the dataset contained pictures therefore resulted in lower possibility to compare frequency of credible tweets amongst tweets with URL links. However, the algorithm still found that containing more than the minimum one URL link indicated higher credibility.

The use of credibility annotation of the tweets as ground truth results in a large error margin. This is in part due to the fact that tweets questioning the validity of the fake pictures will be annoted as fake, when in fact they are credible. This also applies to tweets questioning the validity of real pictures, which therefore should be considered less credible but are annotated as real.

5.2 Algorithms

Even though the results of the algorithm were fairly accurate, I do not believe it is a good algorithm for determining the credibility of randomly selected tweets,

10

(21)

due to the weak data it is based on. With such a small dataset the algorithm has adapted more to edge cases than what is appropriate for a general algorithm.

As noted earlier, the negative starting score in the algorithm is a side effect of the fact that two thirds of the dataset consisted of fake tweets. One way to avoid this could have been to only include the same amount of fake tweets as there are real ones in the dataset. I did not do this however, since it would result in limiting the size of an already far too small dataset.

An issue that arose from using current user information when determining the credibility for old tweets is that the users behaviour could have changed a lot since the tweet was written. Users that were new when they wrote a tweet that is included in the dataset could now have years of experience on Twitter, which makes it difficult for the algorithm to take creation date into account.

This is obvious when looking at the classification of users depending on the creation year of the account. Users with accounts from 2015 are considered much more reliable than those with older accounts, and users with accounts created before or during 2009 are slightly more reliable than those with newer accounts. The lack of linearly changing classification when studying the age of accounts indicates that there is no general correlation between credibility and age of user account.

A lot of factors were weighed similarly in this project as they have been in others. One example of such a coinciding factor is that a high amount of previously created tweets by a user was found to indicate higher user credibility, in accordance with the findings of both [2] and [3].

In accordance with the findings of [2], the WEKA algorithm also found that the presence of positive sentiment words indicated lacking credibility in a tweet. As seen in appendix 2, the score was decreased with -0.852 if positive sentiment words were present, and decreased the score with -1.981 if more than 15 uppercase characters were included as well. However, the WEKA algorithm did not consider negative sentiment words when meassuring credibility.

The use of uppercase characters is not a factor I have encountered in any of the research studied in the course of this project. However, when analysing the dataset it was found that using uppercase characters in a small but not absent amount indicated higher credibility for the tweet. It appears plaussible that such a correlation could be common in tweets, since the text appears more polished and carefully prepared with uppercase characters, while too many indicate an extreme sentiment. In the ADTree the use of uppercase characters is often compared along with text length or number of words in the tweet, since the length of the text needs to be taken into consideration in order to determine what is a large or regular amount of uppercase characters.

It was not possible to take into account the depth of the retweet trees for tweets that had been retweeted multiple times, due to the lack of ability to properly backtrace retweets in the Twitter API. This made the retweet numbers very unreliable, and therefore not taken into account by the WEKA algorithm.

When implementing the algorithms I could have weighed the impact of the user and the tweet credibility differently. This was not done because I did not have sufficient information to base any changes on, and I do not feel that it

11

(22)

would have made a big difference to the outcome due to the weak data the algorithms were based on.

5.3 Future research

To be able to create a better algorithm for determining credibility of tweets, I believe that a good algorithm for interpreting the text in the tweet is needed.

Not only to get a more accurate read on positive and negative sentiment words, which appears to correlate to the credibility, but also to determine whether the user posting the tweet actually believe in it or not. This is also needed to gain more information about possible retweets or other external influences. To develop a program that interprets the text in tweets is what I believe the next step in research on this topic should be.

The ability to better follow the retweet trees is also needed to further the automatic classification of tweets. This is a feature that I would like to see Twitter make available to researchers.

12

(23)

13

(24)

6 Conclusion

To create a good algorithm for determining the credibility of tweets the test data need to be large and very varied. More statistics and retweet traces would be needed from Twitter. With a big database and extensive information on each tweet, I believe it would be possible to create fully automated software to determine the credibility of single tweets.

14

(25)

7 References

[1] Gupta, A., & Kumaraguru, P. (2012, April). Credibility ranking of tweets during high impact events. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media (p. 2). ACM.

[2] Castillo, C., Mendoza, M., & Poblete, B. (2011, March). Information credibility on twitter. In Proceedings of the 20th international conference on World wide web(pp. 675-684). ACM.

[3] Kang, B., O’Donovan, J., & Höllerer, T. (2012, February). Modeling topic specific credibility on twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces (pp. 179-188). ACM.

[4] Sikdar, S. K., Kang, B., O’Donovan, J., Hollerer, T., & Adal, S. (2013).

Cutting through the noise: Defining ground truth in information credibility on twitter. HUMAN, 2 (3), 151-167.

[5] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.

H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.

[6] Freund, Y., & Mason, L. (1999, June). The alternating decision tree learning algorithm. In icml (Vol. 99, pp. 124-133).

[7] Boididou, C., Papadopoulos, S., Kompatsiaris, Y., Schifferes, S., & New- man, N. (2014, April). Challenges of computational verification in social multimedia. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (pp. 743-748).

International World Wide Web Conferences Steering Committee.

15

(26)

(27)

(28)

www.kth.se