• No results found

Longitudinal study of links, linkshorteners, and Bitly usage on Twitter

N/A
N/A
Protected

Academic year: 2021

Share "Longitudinal study of links, linkshorteners, and Bitly usage on Twitter"

Copied!
67
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Bachelor’s thesis, 16 ECTS | Link Usage

2020 | LIU-IDA/LITH-EX-G--20/001--SE

Longitudinal study of links,

linkshorteners, and Bitly usage

on Twitter

Longitudinella mätningar av länkar, länkförkortare och Bitly

an-vänding på Twitter

Mathilda Moström

Alexander Edberg

Supervisor : Niklas Carlsson Examiner : Marcus Bendtsen

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Students in the 5 year Information Technology program complete a semester-long software develop-ment project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, cur-rently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culminates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the final stage of the semester, students create small groups and specialise in one topic, resulting in a bachelor thesis. The current report represents the results obtained during this specialisation work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis.

(4)

Abstract

Social networks attract millions of users who want to share information and connect with people. One of those platforms are Twitter,which has the power to greatly shape peo-ple’s opinions and thoughts. It is therefore important to understand how information is shared among users. In this thesis, we characterize the link sharing usage on Twitter, plac-ing particular focus on third-party link shortener services that hide the actual URL from the users until the users click on a generic, shortened URL, focusing mainly on the link man-agement platform Bitly. The purpose of this thesis is to analyze link usage among users over a specific time period, the domains that different users and link shortens direct their users to and compare the click rates of such links with the corresponding retweet rates to see how this vary over time. We use a measurement framework that is developed by two other students from Linköping University to collect datasets over different time periods. First, we will compare a one-week-long dataset from the spring of 2019 to one that is gath-ered one year later in the spring of 2020. Two additional one-week-long datasets has also been collected during the spring of 2020. We use the two main datasets, separated by a year, to evaluate long-term differences, and the three datasets from the spring of 2020 to analyze shorter-term variations in the link usage. The study highlights with this approach is to be able to highlight significant patterns over time, including with regard to what domains that are tweeted. We have found that the usage of URL link shorterns has not decreased over the last year, though the usage of specifically Bitly has done so. The top domains with highest occurrences from 2019 did not get to keep their high rankings in 2020, this is especially true for facebook.com whose occurrence has dropped by 2.7 percentage points in 2020. Our conclusion is that the difference between the years is not huge but that there are some interesting trends and patterns. Given the prevailing pandemic Covid-19, we have also chosen to do a minor analysis of how many users of Twitter link to domains related to this. It turned out that the link sharing of Covid-19 related substances decreased quite sharply during our analysis period.

(5)

Acknowledgments

We would like to thank our supervisor Niklas Carlsson for his support and guidance during the project. We would also like to special thanks Oscar Järpehult and Martin Lindblom for giving us the opportunity to use their framework for our research and for being so helpful answering questions.

(6)

Contents

Abstract iv

Acknowledgments v

Contents vi

List of Figures viii

List of Tables x 1 Introduction 1 1.1 Motivation . . . 1 1.2 Aim . . . 1 1.3 Approach . . . 2 1.4 Contribution . . . 2 1.5 Delimitations . . . 2 1.6 Thesis outline . . . 3 2 Background 4 2.1 Twitter . . . 4 2.2 Shortened URL . . . 5

2.3 Top domain ranking sites . . . 6

2.4 Related work . . . 6 3 Method 9 3.1 Dataset . . . 9 3.2 Collection approach . . . 11 3.3 Limitations . . . 11 4 Results 12 4.1 High-level link shortener usage . . . 12

4.2 Domain statistics . . . 15

4.3 User statistics . . . 19

4.4 Bitly link interaction . . . 24

4.5 Verified vs non-verified users . . . 25

4.6 Covid-19 analysis . . . 29

5 Discussion 32 5.1 Results . . . 32

5.2 Method . . . 33

5.3 The work in a wider context . . . 33

6 Conclusion 35 6.1 Future work . . . 36

(7)

Bibliography 37

A Appendix 40

A.1 URL shorteners . . . 40 A.2 Collections from 18/3-25/3 and 1/4-8/4 . . . 43

(8)

List of Figures

4.1 Top 20 most frequent domains overall (2019). . . 13

4.2 Top 20 most frequent domains overall (2020). . . 13

4.3 Top 20 most frequent domains for shortener domains (2019). . . 14

4.4 Top 20 most frequent domains for shortener domains (2020). . . 14

4.5 Link popularity distribution to domains of different popularity classes, as defined using the Alexa top-1M lists. . . 17

4.6 Link popularity distribution to domains of different popularity classes, as defined using the Majestic top-1M lists. . . 17

4.7 Distribution of domain rank (2019). . . 18

4.8 Distribution of domain rank (2020). . . 18

4.9 The results from 2019 is found below the vertical divider in pink and 2020 above in blue. . . 19

4.10 Distribution of the age for users account at the time of posting their tweet (2019). . 20

4.11 Distribution of the age for users account at the time of posting their tweet (2020). . 20

4.12 Distribution of the number of tweets favourited by users at the time of posting their tweet (2019). . . 20

4.13 Distribution of the number of tweets favourited by users at the time of posting their tweet (2020). . . 21

4.14 Distribution of the number of tweets posted by users at the time of posting their tweet (2019). . . 21

4.15 Distribution of the number of tweets posted by users at the time of posting their tweet (2020). . . 21

4.16 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2019). . . 22

4.17 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2020). . . 22

4.18 Distribution of the number of followers for users at the time of posting their tweet (2019). . . 22

4.19 Distribution of the number of followers for users at the time of posting their tweet (2020). . . 23

4.20 Distribution of the number of friends for users at the time of posting their tweet (2019). . . 23

4.21 Distribution of the number of friends for users at the time of posting their tweet (2020). . . 23

4.22 Followers-to-friends ratio for users at the time of posting their tweet. . . 24

4.23 Two scatter plots of Bitly clicks-to-retweets-ratio. . . 25

4.24 Logarithmic average of Bitly clicks per retweet. . . 25

4.25 Clicks-to-followers ratio for Bitly links for verified users. . . 26

4.26 Clicks-to-followers ratio for Bitly links for non-verified users. . . 26

4.27 Heat-map of retweets vs followers tweeted (2019). . . 27

4.28 Heat-map of retweets vs followers tweeted (2020). . . 27

(9)

4.30 Heat-map of retweets vs number of tweets tweeted (2020). . . 28

4.31 Heat-map of followers vs number of tweets tweeted (2019). . . 28

4.32 Heat-map of followers vs number of tweets tweeted (2020). . . 28

4.33 Scatter plots for all 3 collections 2020 of Covid-19 clicks-to-retweets-ratio. . . 30

4.34 Scatter plots for all 3 collections 2020 of the overall clicks-to-retweets-ratio. . . 30

4.35 CDFs of the ratio between clicks and retweets for tweets containing Covid-19 re-lated links or hashtags and non Covid-19 rere-lated links and hashtags. . . 31

A.1 Link popularity distribution to domains of different popularity classes, as defined using the Alexa and Majestic top-1M lists (18/3-25/3). . . 47

A.2 Link popularity distribution to domains of different popularity classes, as defined using the Alexa and Majestic top-1M lists (1/4-8/4). . . 47

A.3 Distribution of domain rank (18/3-25/3). . . 47

A.4 Distribution of domain rank (1/4-8/4). . . 48

A.5 Distribution of domain rank (18/3-25/3). . . 48

A.6 Distribution of domain rank for (1/4-8/4). . . 49

A.7 Distribution of the age for users account at the time of posting their tweet (18/3-25/3). . . 49

A.8 Distribution of the age for users account at the time of posting their tweet (1/4-8/4). 50 A.9 Distribution of the number of tweets favourited by users at the time of posting their tweet (18/3-25/3). . . 50

A.10 Distribution of the number of tweets favourited by users at the time of posting their tweet (1/4-8/4). . . 50

A.11 Distribution of the number of tweets posted by users at the time of posting their tweet (18/3-25/3). . . 51

A.12 Distribution of the number of tweets posted by users at the time of posting their tweet (1/4-8/4). . . 51

A.13 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (18/3-25/3). . . 51

A.14 Ratio between tweets favourited and tweeted by users at the time of posting their tweet (1/4-8/4). . . 52

A.15 Distribution of the number of followers for users at the time of posting their tweet (18/3-25/3). . . 52

A.16 Distribution of the number of followers for users at the time of posting their tweet (1/4-8/4). . . 52

A.17 Distribution of the number of friends for users at the time of posting their tweet (18/3-25/3). . . 53

A.18 Distribution of the number of friends for users at the time of posting their tweet (1/4-8/4). . . 53

A.19 Followers-to-friends ratio for users at the time of posting their tweet . . . 53

A.20 Clicks-to-followers ratio for Bitly links for verified users. . . 54

A.21 Clicks-to-followers ratio for Bitly links for non-verified users. . . 54

A.22 Heat-map of retweets vs number of tweets tweeted (18/3-25/3). . . 55

A.23 Heat-map of retweets vs followers tweeted (18/3-25/3). . . 55

A.24 Heat-map of followers vs number of tweets tweeted (18/3-25/3). . . 55

A.25 Heat-map of retweets vs number of tweets tweeted (1/4-8/4). . . 56

A.26 Heat-map of retweets vs followers tweeted (1/4-8/4). . . 56

A.27 Heat-map of followers vs number of tweets tweeted (1/4-8/4). . . 56

A.28 Two scatter plots of Bitly clicks-to-retweets-ratio. . . 57

(10)

List of Tables

4.1 Amount of tweets collected at the various time occasions divided into categories. . 12

4.2 Top 20 most frequent domains for all links. . . 15

4.3 Top 20 most frequent domains for shortened links. . . 16

4.4 Top 20 most frequent domains for Bitly links. . . 16

4.5 Amount of unique users for each category and how many of those users that are verified. . . 24

4.6 Amount of links tweets related to Covid-19. . . 29

4.7 Amount of Bitly links tweets related to Covid-19. . . 29

A.1 All collected shorteners sorted on domains and how many we were able to get the full domain from. . . 42

A.2 Amount of tweets collected at the various time occasions divided into categories. . 43

A.3 Most frequent domains (18/3-25/3). . . 43

A.4 Most frequent domains (1/4-8/4). . . 44

A.5 Top 20 most frequent domains (18/3-25/3). . . 45

A.6 Top 20 most frequent domains (1/4-8/4). . . 46

A.7 Amount of unique users for each category and how many of those users that are verified. . . 54

(11)

1

Introduction

In 2006, a social media networking site launched called Twitter. It is today one of the most popular social media platforms, with 100 million daily active users and 500 million tweets sent daily [35]. Twitter can be used for many purposes, to follow friends and family, receive news, follow high-profile celebrities and world leaders. To understand and analyse patterns of what kind of content that is being shared online, and how this pattern varies over time on Twitter, is important to get a better understanding how users behave on social media. We will investigate how this behavior varies and analyse patterns.

1.1

Motivation

Using short URLs on Twitter has increased in popularity in the past few years [9]. This is mostly due to the fact that Twitter, as one of the most popular social media networks, has a 280 character limit for tweets [7]. You can simply fit more content in less space with URL shorteners, you can even customise the URL to make it more attractive for you followers to click on. When social media networks such as Twitter become more popular, the need to make sharing web content easier will increase. Shorter URLs are becoming more and more integral to that cause. URL shorteners, in their own way, work as aggregates of information. This can lead to some useful mashups and innovations in how people share and digest content. This thesis analyzes the use of short URLs by Twitter users, to get a better understanding how users behaveomonsocial medias. We also study how users behave in special times like the pandemic Covid-19 that currently dominates the world.

1.2

Aim

In this thesis, we perform an analysis on the use of short URL by investigating their click traffic, doing a longitudinal measurement over a span of one year. The aim is to get a bet-ter understanding of patbet-terns over time when it comes to link, linkshortener and Bitly usage on Twitter. Bitly is a commonly used link management platform [2]. From the aim of this research the following sentences lays as the foundation that will be analyzed over a specific time to see how the patterns varies. With links being posted on Twitter, we want to inves-tigate if it is possible to discover significant patterns over time in regards to what domains

(12)

1.3. Approach

are tweeted. Also, over different time aspects see if there are any correlations to be found between users and how they interact with tweets containing links. This is important because a social network as Twitter typically contains a massive number of short URLs and an ef-ficient mechanism is needed to collect their traffic. Another horrible, yet interesting aspect this year is the prevailing pandemic Covid-19 which at the time of writing is affecting the whole world, today social media account for 30 % of the overall visits to websites [11]. In other words, this is a time when knowledge of misleading information sharing, and fake news is more important than ever. The highlights of our work can be summarized as follows: We generate and collect tweet traffic dataset, looking for tweets containing URL shorteners, looking especially for Bitly links. The script will also look for links that regards Covid-19 to see how large a proportion of this topic increases or decreases the need to access information about the pandemic.

Research questions

• Over a one year span, is it possible to see any significant patterns in regards to links, linkshortener and Bitly usage on Twitter? What has changed or not changed over the last year?

• In a shorter time aspect of just over a month, how do users on Twitter tend to write and link things that regards Covid-19? How does this behavior change over time?

1.3

Approach

Last year two IT students from Linköping University, Martin Lindblom and Oscar Järpehult developed a framework to see how tweets are retweeted and clicked. This thesis uses their framework in the same way but collects datasets in different time intervals. We compare the results they got last year (2019) to the datasets collected this year (2020). Both years the data is collected in early May. In the Appendix we include the results from two other datasets collected in April 2020, this to conclude if there are any weekly patterns in the behaviour of users on Twitter.

1.4

Contribution

With a longitudinal analysis we provide a time-aspect to compare data of link usage and user behavior on Twitter, with different aspects as Covid-19. A temporal analysis to see how the behaviour varies over time, both year-to-year and across multiple weeks. In the future this methodology can be used to compare behaviour over longer time periods, looking at different aspects.

1.5

Delimitations

We limit our analysis to only collect data over a time period spanning seven days at each collection, this to make it manageable to process and to make sure the datasets did not get too large. The collection have been chosen to be done in three rounds for a rather narrow period of time during the last half of our spring term as we write our Bachelor’s thesis. In the future, it would have been interesting to look at data collected over a longer period of time. Every tweet contains a lot of data and we have chosen to focus on link shorteners, so we have chosen to adapt which parameters we retrieve and we then limit what fields are saved to the dataset. Media fields and user created text fields are ignored to avoid complications when saving to file. The API’s that we used from Twitter and Bitly are the free versions that only gives a sample of all tweets, in the future it would be of intrest to compare it to a paid tier of their API’s.

(13)

1.6. Thesis outline

1.6

Thesis outline

This thesis is structured as follows. Chapter 2 presents some background of relevant areas which is necessary to understand the following work. This chapter ends with a section with related work to see what others have done in the subject. Then, in the next chapter we present our method and analysis. Chapter 4 presents our results, highlighting comparisons between the 2019 and 2020 datasets. In Chapter 5 we discuss similarities and differences from the col-lections, and provide suggestions of improvements and ideas for future work. Finally, con-clusions are presented in the last chapter. Additional results and complementing information (e.g., a list of all the domains that we considered shorteners) are provided in the Appendix.

(14)

2

Background

Background research has been made to assign information relevant to the work. The areas explained are Twitter, shortened URL and top domain ranking sites. Lastly, related work is presented, different papers regarding URL shorteners and spam, Twitter behavior, what kind of content that is being shared has been summaries to fill the aspect of what others have done researches about in this topic.

2.1

Twitter

Twitter is a ’microblogging’ system that allows its users to write posts that can be up to 280 characters long, these posts are called tweets. Tweets can include text, photos, videos and links to relevant websites and resources. To cope with the limitations of tweet lengths, link-shortening is commonly used. Twitter has today 326 million monthly active users, users follow other users and the country with the most users is USA and there are more men than women [35]. You can create your own tweets, or you can retweet information that has been tweeted by others. When you retweet, you forward the original tweet to your followers [32]. Retweeting means that information can be shared quickly and efficiently with a large number of people. A user on twitter can follow other users and have their own followers. Friends is someone that Twitter users follows (e.g. following or refereed as "follower"). When a user is following another user, they subscribe to that users tweets which will appear in their home timeline [34]. On twitter a user can respond to another user’s tweet by replying to the tweet. It is also possible to mention another user in your tweet. This is accomplished by using the "@" symbol in front of the username of the user that you want to mention in your tweet. When doing so the mentioned user will receive a notification about your tweet [30]. An Twitter ac-count may be verified if it is determined to be an acac-count of public interest. This typically includes accounts own by users in politics, religion, music, acting, business and other key in-terest areas [27]. The Internet and especially social media have a great influence on the world, it consists of a huge ocean of opinions and Twitter is no exception [4]. In turbulent times around the world that we see in spring 2020 in conjunction with the Covid-19 pandemic, but also in less context as election campaigns, many seek social media to make their voices heard. Twitter today has almost all of the world’s leaders as diligent users on their platform [21]. The American president Donald Trump and his Twitter account @realDonaldTrump has over 75 million followers [31]. In addition to ordinary individuals, many companies, websites, artists

(15)

2.2. Shortened URL

and organizations use Twitter as a way to reach out to their users, customers and fans. The service received a lot of attention during presidential election in the US 2008 when presiden-tial candidate Barack Obama’s campaign used Twitter and other social media to reach out to his voters. Twitter has also been a tool for regime critics in totalitarian regimes to communi-cate with the outside world, as well as during natural disasters and the pandemic we see right now, conflicts and the like, where private individuals on the spot have been able to report di-rectly on the events. The service is considered to have played an important role during the Arab Spring, where activists were able to communicate and spread their messages globally [21]. Twitter has compared to many other social media platforms as for example Facebook has several rules and policies that must be followed by all their users. One can often read in the newspapers about how different tweets posted by, among others, different politicians and world leaders have created big headlines if Twitter decides if some of their tweets gets removed due to rule violations [33]. Thus, it is not difficult to imagine the real power that Twitter actually has today.

Twitter API

To share information as widely as possible, Twitter provides users with programmatic access to Twitter data through there APIs (application programming interfaces). APIs are the way computer programs "talk" to each other’s so that they can request and deliver information through HTTP requests. To get access to Twitters free API version you have to fill in an application about your intentions with the API to get a developer account, the application then needs to be granted by Twitter, for this paper we have used one developer account [1]. With the free version of collection tweets, the tweets are returned as a JSON object from the last seven days [28].

2.2

Shortened URL

URL-shortening is used to shorten a long web-address to a short, often is this shortening created by an external service, for example through Bitly or ow.ly. The purpose is at one aspect to be able to send links when there is a character limit to the service, for example both SMS and twitter has a limit on how many characters are allowed in a single message or tweet. But a URL-shortener can also be used to reduce the risk of a URL containing special characters being distorted and to make it easier to memorize a URL. Some URL-shortener services also offer so you can custom your URL, making it even more easily to remember [13]. Web services that generate shortened URLs are referred to as URL shortening services, for example Bitly and TinyURL, for example Bitly and TinyURL. They also provide the dereference function, i.e. redirection of the shortened URL to the original one. The shortened URLs generated from the same URL differ from service to service. A shortened URL consists of the domain name of the shortening service and a unique key associated with the original URL. Today there are many services that offers link shortening, the problem is that many of the services are not serious and have been used for the purpose of sending out spam as well as getting people to visit pages they would normally never do. You can easily redo the URL provisioning, so it has nothing to do with the website you are directed to. The security aspect is important and therefore several major sites have chosen to obtain their own shorteners or clearly write which of all the third-party link shortener service that they recommend their users to use. This paper focus on the third-party URL shorteners and the clicks that the most popular such service (Bitly) generates. Twitter has its own link shortener t.co, links shared on Twitter will automatically be processed and shortened to an http://t.co link. There link service measures information such as how many times a link has been clicked, which is an important quality signal in determining how relevant and interesting each Tweet is when compared to similar Tweets. Having a link shortener protects users from malicious sites that engage in spreading malware, phishing attacks, and other harmful activity. A link converted by Twitter’s link

(16)

2.3. Top domain ranking sites

service is checked against a list of potentially dangerous sites. Users are warned with the error message below when clicking on potentially harmful URLs. The link service at http://t.co is only used on links posted on Twitter and is not available as a general shortening service on other apps or sites [29].

Bitly API

Bitly is a platform where you can shorten, share, managing and analyze links to your con-tent. Billions of links are created every year by millions of users, from individuals to small businesses to Fortune 500 companies. Through the Bitly API you can track real-time click data and learn your top referrers and locations, you can also see when and where your links are clicked. Info, clicks, countries and referrers are the 4 major types of meta-data that Bitlys API provides. Info contains the properties of the short URL that referrers to the actual long URL behind the short one. Clicks contains the total amount of clicks for the short URL. The number of clicks and referrers from various countries are also provided, referrers is the ap-plications or web page that contain the short URLs [5]. This thesis is mainly focusing on Bitly as a third party links shortening.

2.3

Top domain ranking sites

Top domain rankings sites are sites that rank the most popular websites worldwide. The rank websites are based on a combined measure of page views and unique site users and creates a list of most popular websites based on this ranking time-averaged over a specific time periods, often only the highest-level domain is recorded. In this paper we will compare our results to two of the most common used top domain ranking sites, Alexa and Majestic. Which one you choose depends on what you are trying to accomplish. Alexa determines the rank of a website on combined measure of unique page views and visitors. Page views are the total number of Alexa user URL request for a site and unique visitors are determined by the number of unique Alexa users who visit a site on a given day. However, multiple requests for the same URL on the same day by the same user is counted as a one page view. The site with the highest combination of unique visitors and page views is ranked as number one [17]. The most popular and widely used top list is the Alexa Global Top 1M list [25]. The "Majestic Million" ranks the top 1 million websites in order how many other websites that link to them. By crawling the web and counting the number of referring subnets for each individual domains the data is used to construct the list, so unlike Alexa, Majestic does not take in count how often a link is clicked [3]. In this paper for the main collection collected 18-25/4 the Alexa and Majestic list that will be used was downloaded from their respective website 25-04-2020.

2.4

Related work

Related work has been divided into smaller sections covering the following areas: URL eners and spam, Twitter behaviour and what kind of content is being shared with URL short-eners.

URL shorteners and spam

Shortened URLs can serve many legitimate purposes, such as click tracking as this paper is analyzing but also serve illicit behavior such as fraud, decit and spam. A research in the topic conclude that more than half of the URLs shared today are spam [8]. Spam, is something that Florian Klien and Markus Strohmaier has studied more about, the usage of logs of a URL shortener service. They expose the extent of spamming taking place in their logs and provide an interesting insight into the danger of spamming via URL shortener services. Services as

(17)

2.4. Related work

bit.ly and others play a critical role on the web today, spam is a problem both for users of link shorterns and operators. The paper has found that around 80% of shortened URLs contained spam-related content. The lack of spam blocking features can be a major reason to the high numbers. Their geographical analysis reveals that this problem has an international scale, the state that URL shorteners play a role in spam attacks that cross different countries. Also, that the use of URL shorteners varies a lot between different countries. A lot of countries resolve more links than they create but even more create more links than they resolve which can be drawn in parallel to that a high outdegree seems to be indicative of creating nations, that the authors states can be linked to spamming. A high indegree seems to be indicative of spam receiving countries (target of spam). In the ratio between resolves and creates, which tells us if a particular country visited more links than it created. Their research has found that Northern America, Asia, Australia and some part of Europe are identified as mostly resolvers and only a small number of creators. Whereas South America and Africa are identified as mostly creators with small number of resolves [19]. Another study about URL shortener spam is from [9] who highlight spammers that adopt the URL shorteners to camouflage and improve the user click-through of their spam URLs. They measure the misuse of the short URLs and analyze the characteristics of the spam and non-spam short URLs. There results showed that the majority of the clicks are from direct sources and that the spammers utilize popular websites to attract more attention by cross-posting the links.

Twitter behavior

Antoniades et al.[2] studied the usage of shortened URLs on Twitter. In this study they col-lected data from twitter, owly and bitly and look at what content is being shared, how pop-ular the URLs are, the life span of the URLs and how shortened URLs can affect the web performance. They found that news, info/edu and "various" were the most popular types of content and that a small number of the URLs gets most of the clicks. Furthermore 50% of shortened URLs appears to be live for more than three months and they also found that short-ened URLs can affect the user experience because of the redirection time which will result in a slower access time for the user. [18] studied a different aspect of Twitter related to URL shorteners, the value of the shortened URLs referenced in tweets. Their results indicated that unlike frequently bookmarked URLs, which are generally of high quality, frequently tweeted URLs tend to fall in two different conflicting types. Either they come from sites of high qual-ity or they are spam. Another article by Garimella et al. [12] studied the Twitter behaviour in the US between 2009 to 2016, they used Twitter data to study the political polarization. The article did not though state in which direction the polarization went, but there analysis showed that the polarization did increase, depending on the measure, the relative change is 10 % to 20 %. Another interesting Twitter behaviour that has been analysed by [14]that focus on temporal click dynamics for links to the news articles of a selected set of new websites by combining the Twitter steaming API and the Bitly API, to see how many users that actually read the articles linked in the tweets they share and retweet. Their analysis highlights signif-icant differences in the clicks-per-retweet ratios of individual links and also big differences in number of links for which there are more retweets than clicks.

Another interesting research that was made after the tragedy that happened in Christchurch terrorist attack in New Zealand in 2019 [10]. Two days after the attack, at least, 7,22,295 tweets are created by users of Twitter, they tweet about their thoughts and prayers about the attack. This again shows how very important social media is to spread information tweets are created by users of Twitter, they tweet about their thoughts and prayers about the attack. The paper examines the use of Twitter in this specific crisis. Their findings is that an individual might have more information-spreading power than authority or government institutions. That the influence of non-authority individuals on social media platforms like Twitter might spread information wider than authority without knowing the righteousness. Another project in this subject was made after the great earthquake and tsunami that hit

(18)

east-2.4. Related work

ern Japan in 2011 [15]. Right after that, several web sites, especially those providing helpful disaster-related information, were overloaded due to flash crowds caused by Twitter users (flash crowd is a sudden, large surge in traffic to a particular Web site). To reduce this is-sue because flash crowds can be a serious problem in an emergency, they developed a new URL shortener that redirects Twitter users to a CDN (Content Delivery Network) instead of original sites. Their dataset was launched just days after the earthquake and is now publicly available online for further collaboration.

What kind of content is being shared with URL shorteners

Nikiforakis et al. [23] CITE discuss how ad-based URL shortening services can pose a threat to those clicking on the shortened link. Ad-based URL shortening services work by display-ing an ad to the user before redirectdisplay-ing them to the actual site and in that way the person publishing the shortened URL can get some income from the clicks on the URL. A short-ened URL is a good way for malicious sites to evade blacklists and filtering systems, but an ad-based shortened URL service also makes it possible for a malicious ad to get to the user, making it harder for the user to stay safe on the internet. These malicious ads appears to be able to escape their container and redirect the user to a different harmful site. Nikiforakis et al. CITE also found that these advertisements were also able to perform drive-by downloads and attempt to trick the user to download malicious malware through the ad.

(19)

3

Method

The method used for our analysis was performed with the framework that combines the two APIs from Twitter and Bitly. For this longitudinal study we collect multiple week long datasets and compare them to each other, and also to the dataset that was collected roughly one year ago by Martin Lindblom and Oscar Järpehult (2019). The aim is to compare the different datasets to get an longitudinal studie of link usage on Twitter. We collected in total three seven-days datasets, the first dataset was collected between 18-25/3 2020, the second 1-8/4 2020. The last dataset was collected between 18-25/4, which is most close to the same period last year (26/4-3/5 2019) and the primary collection we will focus on in our results.

3.1

Dataset

The datasets where stored locally and every tweet we extracted the data needed for the set. To get data for number of retweets we also collected extra data for each tweet 24 hours after it was collected the first time, in case that a Bitly-link was found in the tweet the data would then also add information about the Bitly-link.

Collection

The collection of the data was divided up between three different .csv files, these are "collec-tion_date.csv", "bitly_date.csv" and "retweet_date.csv". In the collection file we have all of the twitter data, in the Bitly file all the Bitly related data where collected and in the retweet file all of the retweet information where collected. Each file contains the information from four hour intervals. Since we did week-long collections each file is split up into 42 different files. When working with this data we merged all of these files to end up with one big .csv file called dataset.csv. This dataset.csv contains 28 different headers. Below we list the different headers and what information was collected with each one.

• tweet_id - The unique id for the twitter account • tweet_created_at - When the tweet was created

• tweet_place_id - The place id from where the tweet is posted

(20)

3.1. Dataset

• tweet_place_country_code - The country code of where the tweet was posted from • tweet_geo_coordinates - The geographical coordinates from where the tweet was

posted from

• tweet_language - The language of the tweet • tweet_hashtags - The hastags in the tweet • tweet_urls - URLS in the tweet

• tweet_contains_retweet - If the tweet is a retweet or not

• tweet_in_reply_to_status_id - If the tweet is a replay to antoher tweet • user_id - The id of the user that created the tweet

• user_created_at - When the user account was created • user_followers_count - How many followers the user has • user_friends_count - How many friends the user has

• user_statuses_count - How many tweets the user has posted

• user_favourites_count - That amount of tweets the user has favourited • user_verified - If the user is verified or not

• user_language - The users language

• retweet_count - The amount of retweets the tweet has gotten

• retweets_retrieved_at - The date when the amount of retweets was retrived • bitly_all_clicks- The number of clicks a bitly link has recived

• bitly_twitter_clicks - The amount of clicks the bitly link has recieved from twitter • bitly_all_clicks_since_posting - The amount of click the bitly link has recived since it

was tweeted

• bitly_twitter_clicks_since_posting - The amount of clicks the bitly link has recived from twitter since the tweet was posted

• bitly_end_url_string - The URL that the bitly link redirects to • bitly_created_at - When the bitly ink was created

• bitly_data_retrieved_at- When the bitly data was collected

The data collection is split into two phases. In the first phase Twitter’s streaming API (Ap-plication Programming Interface) is used to collect as many tweets as possible together with information about each tweet such as when it was tweeted and who posted the tweet. After 24 hours, the second phase took place, where information about retweets of these particular tweets where collected. The second phase also collected information about the URLs that the link shorteners in the tweets redirected to. For every link, three things where important to collect: all clicks from all sources, all clicks from Twitter, clicks from all sources since the post-ing of the tweet and clicks from Twitter since the postpost-ing of the tweet. In case it was a Bitly link, specific information about the embedded Bitly link, including various click statistic. To filter out the Bitly links was the easier part using the Bitly API, it was harder to identified all link shorteners and look up their full URLs, some URLs redirected to an invalid page. For

(21)

3.2. Collection approach

all invalid pages, we decided to no include these in to our analysis of shorteners. In total we collected over 11 million link shorteners. From this framework a longitudinal measure-ment study of tweets posted on Twitter could be done. Last year, a data collection from a bit more than 25 million tweets over the span of seven days where collected between 26/4-3/5 2019. This report will use the framework developed last year and also the results that were carried out to collect new data sets and analyze the difference at different time periods. For the interested reader we would like to refer to the report Longitudinal measurements of link us-age on Twitter written by Oscar Järpehult and Martin Lindblom for more details on how the framework was implemented[16].

3.2

Collection approach

The main aspect is to analyse how link shorteners and mostly Bitly links are used on Twit-ter and to understand how tweets with links are retweeted and clicked, also who use URL shorteners and their Twitter behavior. The other aspect is whether some of these links have connections to Covid-19, how large or not that ratio is and what kind of info and web pages about the subject these links are linked to. We want to see how this has looked in relation to other tweets as well as how that behavior has changed from last year.

3.3

Limitations

Due to the limited time available, it led us to delineate us to do only three data collections. Also the fact that it regards a huge amount of data, some problems with the network con-nections and working with external APIs we ran into some limitations in our workflow that affected the output that is stated below. As the Covid-19 pandemic was ongoing during all our collections, it was inevitable not to include it in our calculations, but it is the only special thing during our collection period that we have taken into account. We have not taken into account if it was some other types of special events during this time that might had an impact on the data.

Dataset collection

Our data collection is limited to a narrow time period, which means that there may be pat-terns or other user behavior just during our collection period that do not really fully reflect reality. We also chose to use the free version of the Twitter API which gives us only limited access to all tweets, streaming realtime tweets will return around 1 % of all tweets posted at any timed with the ability to add custom filters to the stream [24]. Other analyzes that have used the pay version are more higher-level analysis than ours, so we have considered that the free version is sufficient for our analysis.

Covid-19 relations

To find tweets with URL shorteners regarding the Corona virus we put out markers and searched for hashtags or URL names containing: COVID19, covid19, Covid-19, covid-19, Coro-navirus, coroCoro-navirus, pandemic and Pandemic. Of course, there are several links and hashtags that are sure to affect the virus that we missed. Please note that we have only used English spellings of the pandemic, but the virus name Covid-19 and Corona are used worldwide. English is generally the most widely used language on Twitter (34 % of all tweets are written in English [22]), in order to get a more fair world-wide picture, we would have had to look at all the spellings there is of the word pandemic.

(22)

4

Results

We will first present the results of the longiudinal measurment from the collections from 2019 and 2020, where the results from 2019 are taken from the report Longitudinal measurements of link usage on Twitter by Oscar Järphult and Martin Lindblom. Table 4.1. below summarizes the fraction of the total number of tweets collected, we have used the same list as Oscar and Martin did to which domains that are considered as shorteners (see the list in Appendix). Results from the two collections from April is stated in the Appendix. Table 4.1 shows that we roughly gathered 8 million tweets more 2020 than 2019, though in 2019 it was collected significantly more Bitly tweets. The final part of the result is about Covid-19 relation analysis.

Category 2019 2020

All Tweets 25,482,108 (100%) 33,281,088 (100%) Link Tweets 4,026,101 (15.8%) 3,803,233 (11.4%) Shortener Tweets 322,954 (1.27%) 310,915 (0.93%) Bitly Tweets 159,143 (0.625%) 52,517 (0.158%)

Table 4.1: Amount of tweets collected at the various time occasions divided into categories.

4.1

High-level link shortener usage

The first section shows the high-level link shortener usage. For all collections twitter.com is naturally the most frequent domain, this is because each retweets contains the URL of the original tweet. The figures in this sections lists the 20 most common domains and shorteners. Figures 4.1 and 4.2 displays that youtu.be and bit.ly where common shorteners both years. youtu.be is Youtube’s own link shortener, that only points to Youtube videos. Two differences that can be distinguished is that du3a.org (a website which automatically post Islamic prayers to Twitter) that had 187580 occurrence’s 2019 is not at all in the top 20 in 2020, neither in the two first collections from 2020 (see Appendix Table A.4). The other thing is that facebook.com goes from fourth place to seventeenth place between the years. The top shortener domains from both years are almost the same, this can be seen in Figures 4.3 and 4.4.

(23)

4.1. High-level link shortener usage

Figure 4.1: Top 20 most frequent domains overall (2019).

(24)

4.1. High-level link shortener usage

Figure 4.3: Top 20 most frequent domains for shortener domains (2019).

(25)

4.2. Domain statistics

4.2

Domain statistics

This section regards the domains and understanding what domains people tend to link on Twitter. We have used different methods, we extracted the long URL that each link shortener directed to and analyzed the frequencies and popularity of these domains.

Top domains

From the framework three different sets where constructed, all links, link shorteners" and Bitly links. The figures show the number of occurrences of the top 20 for all three sets respec-tively. Here is also the Alexa and Majestic top 1 million domains rankings included, when they are not available we list "-".

Domain Occur. Alexa Maj. 1 twitter.com 2167059 12 4 2 du3a.org 187580 - -3 youtube.com 147359 2 3 4 facebook.com 123883 3 2 5 instagram.com 57117 15 7 6 showroom-live.com 42356 4156 77018 7 curiouscat.me 41691 4915 168225 8 peing.net 25007 6472 228312 9 twittascope.com 23174 301905 -10 dlvr.it 19700 - 11127 11 fllwrs.com 17613 59565 831014 12 open.spotify.com 14160 - 219 13 lawson.co.jp 13264 35589 17836 14 twcm.co 12265 - -15 naver.me 11310 177121 23425 16 pscp.tv 10895 2836 1428 17 blbrd.cm 9964 210800 -18 swarmapp.com 9752 73711 29610 19 cas.st 8642 - -20 shindanmaker.com 8326 8309 32267 (a) Top domains for all links (2019).

Domain Occur. Alexa Maj.

1 twitter.com 2134010 55 4 2 youtube.com 213002 2 3 3 instagram.com 50572 32 5 4 peing.net 46834 17513 185762 5 onlyfans.com 34244 1060 25602 6 open.spotify.com 23339 - 155 7 twittascope.com 21163 300510 -8 fllwrs.com 19676 321649 -9 naver.me 18346 82756 19751 10 family.co.jp 18327 665828 24624 11 dlvr.tv 17812 - 14409 12 twitch.tv 16106 33 331 13 twitcasting.tv 14322 5932 46809 14 ift.tt 11693 - 10073 15 facebook.com 11185 7 1 16 news.livedoor.com 11081 - 8934 17 twtcom.co 10076 - -18 pscp.tv 9828 16945 1785 19 curiouscat.me 9534 16634 150182 20 headlines.yahoo.com 8839 -

-(b) Top domains for all links (2020).

Table 4.2: Top 20 most frequent domains for all links.

In Table 4.2 above can we see that du3a.org that was the second most occurred link in 2019 is not even in the top 20 in 2020. Another difference is facebook.com that in 2019 had 123883 occurs and then in 2020 drops to 11185 occurs.

(26)

4.2. Domain statistics

Domain Occur. Alexa Maj.

1 youtube.com 117912 2 3 2 twittascope.com 23173 301905 -3 lawson.co.jp 13226 35589 17836 4 k.kakaocdn.net 5457 - -5 img1.daumcdn.net 5168 - -6 linkedin.com 2327 43 6 7 instagram.com 2137 15 7 8 t1.daumcdn.net 1846 - -9 reddit.com 1521 16 43 10 youtu.be 1510 31627 14 11 google.com 1343 1 1 12 cards.twitter.com 1195 - -13 extratv.com 1106 118225 12330 14 el-nacional.com 980 1253 8829 15 mayla.jp 927 - -16 facebook.com 884 3 2 17 54.202.34.80 861 - -18 careerarc.com 822 70517 91327 19 uls.her.jp 805 - -20 drive.google.com 792 - 39 (a) Top domains for shortened links (2019).

Domain Occur. Alexa Maj.

1 youtube.com 177203 2 3 2 twittascope.com 21160 300510 -3 k.kakaocdn.net 3565 - -4 goo.gl 1846 3320 6 5 linkedin.com 1661 75 6 6 img1.daumcdn.net 1535 - -7 dolk.jp 1322 - -8 t1.daumcdn.net 1264 - -9 akindo-sushio.co.jp 1256 17307 191994 10 easyriders.jp 1247 - -11 shop.funko.com 1215 - -12 drive.google.com 1132 - 36 13 rbeiv.com 801 - -14 go.onelink.me 582 - -15 music.bugs.co.kr 576 - -16 rbeja.com 527 - -17 duratexintl.com 518 - -18 youtu.be 512 10612 13 19 str2b.openstream.co 484 - -20 rbejc.com 482 -

-(b) Top domains for shortened links (2020).

Table 4.3: Top 20 most frequent domains for shortened links.

In Table 4.3 above there is not much difference between the results, lawson.co.jp had high occurrence (third place) in 2019 but is not in the top 20 in 2020. The same trend we seen before can also be seen here that facebook.com does not occurs in the results from 2020.

Domain Occur. Alexa Maj. 1 twittascope.com 23173 301905 -2 lawson.co.jp 13226 35589 17836 3 k.kakaocdn.net 5457 - -4 img1.daumcdn.net 5164 - -5 instagram.com 2133 15 7 6 t1.daumcdn.net 1843 - -7 reddit.com 1518 16 43 8 google.com 1333 1 1 9 youtu.be 1320 31627 14 10 cards.twitter.com 1194 - -11 extratv.com 1106 118225 12330 12 youtube.com 1040 2 3 13 el-nacional.com 980 1253 8829 14 mayla.jp 927 - -15 54.202.34.80 861 - -16 facebook.com 824 3 2 17 careerarc.com 822 70517 91327 18 uls.her.jp 805 - -19 drive.google.com 789 - 39 20 cdiscount.com 781 780 11783

(a) Top domains for Bitly links (2019).

Domain Occur. Alexa Maj. 1 twittascope.com 9374 300510 -2 k.kakaocdn.net 2449 - -3 img1.daumcdn.net 1085 - -4 t1.daumcdn.net 777 - -5 rbeiv.com 697 2 3 6 drive.google.com 684 - 36 7 youtube.com 665 2 3 8 shop.funko.com 629 - -9 easyriders.jp 593 - -10 akindo-sushiro.co.jp 591 17307 191994 11 dlvr.it 385 - 14409 12 youtu.be 366 10612 13 13 go.onelink.me 319 - -14 rbapc.top 311 - -15 rbtoe.com 289 - -16 dscygl.xyz 282 - -17 facebook.com 280 7 1 18 rbeja.com 278 - -19 lin.ee 275 - 8042 20 str2b.openstream.co 330 - -(b) Top domains for Bitly links (2020).

Table 4.4: Top 20 most frequent domains for Bitly links.

The top 20 most frequent domains for Bitly links in 2019 versus 2020 is listed above in Table 4.4. Nor can we see such a big difference except for a few, in 2019 instagram.com got several hits using Bitly links, similar results can not be seen in 2020.

(27)

4.2. Domain statistics

Popularity Distribution

We also want to present how the distribution of links looks like with similar global rank according to Alexa and Majestic. This by distribute the domain in every link assigned to one of the following classes: Alexa[1-10]; Alexa[11-100]; Alexa[101-1K]; Alexa[1001-10K]; Alexa[100001-100K]; Alexa[100001-1M]; other [nonranked]. Figures 4.5 displays Alexa 1 mil-lion and Figures 4.6 Majestic 1 milmil-lion with the same classes.

(a) 2019. (b) 2020.

Figure 4.5: Link popularity distribution to domains of different popularity classes, as defined using the Alexa top-1M lists.

(a) 2019. (b) 2020.

Figure 4.6: Link popularity distribution to domains of different popularity classes, as defined using the Majestic top-1M lists.

Domain frequencies

The figures show the Cumulative Distribution Function (CDF) and Complmentary CDF (CCDF) respectively, of the fraction of links that different ranked domains are responsible for. The results from both years are almost identical and we can see in Figures 4.7a and 4.8b that a small number of domains make up a large part of all links and that all classes show the same pattern. The straight line shape in Figures 4.7b and 4.8b suggest that the distribution are power-law line [20].

(28)

4.2. Domain statistics

(a) CDF (2019). (b) CCDF (2019).

Figure 4.7: Distribution of domain rank (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.8: Distribution of domain rank (2020).

Relative ranks and frequencies

Figure 4.9 displays a pairwise scatter plots showing the frequencies and ranks of the top-25 domains sets based on All Links, Shortened Links, Bitly Links, Alexa and Majestic for 2019 and 2020. Blue circles are used for domains with known ranks and red crosses at rank 106are used to illustrate domains with unknown rank.

(29)

4.3. User statistics

Figure 4.9: The results from 2019 is found below the vertical divider in pink and 2020 above in blue.

Phishing domains

We ran all our collections through the database of Phishtank (a service that enables users to report and review suspicious phishing sites) but did not find any matches with our links.

4.3

User statistics

In this section we will present the results of how users use link shorterners. We will take a closer look to age of the account, number of tweets favourited by users, how many tweets the user posted and how many followers the user has. Last we will look at verified users.

Age

Figures 4.10 and 4.11 show the distribution of the age of accounts for users at the time of post-ing their tweet. A conclusion that can be drawn from both results is that tweets containpost-ing links are more likely to belong to a rather old account, this is especially true for Bitly links.

(30)

4.3. User statistics

(a) CDF (2019). (b) CCDF (2019).

Figure 4.10: Distribution of the age for users account at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.11: Distribution of the age for users account at the time of posting their tweet (2020).

Favourites

In this section, we look at how users of link shorteners favorites other tweets, the probability that a user of link shortcuts favorites other tweets. We note that Bitly tweets more frequently are posted by users that have favourited less tweets over both years in Table 4.12a and 4.13a. The CCDF plotted in the same figures to the right (Figures 4.12b and 4.13b) show that "All Tweets" and "Link Tweets" are more back-heavy (first mentioned more than the second) when it comes to favorite other tweets.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.12: Distribution of the number of tweets favourited by users at the time of posting their tweet (2019).

(31)

4.3. User statistics

(a) CDF (2020). (b) CCDF (2020).

Figure 4.13: Distribution of the number of tweets favourited by users at the time of posting their tweet (2020).

Number of tweets

Figures 4.14 and 4.15 will give an overview how many tweets that the user has tweeted at the time of posting their tweet we collected. Figures 4.14.a and 4.15.a show that there is no specific type of tweet from our categories that has been tweeted more in the past, but in 2019 we can see that in the end of the spectrum before the last jump there is a clear difference. Figures 4.18b and 4.19b tells the same story as above but here is a clear difference between the years, in 2019 there is a huge jump in probability, a trend that can not be seen in 2020.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.14: Distribution of the number of tweets posted by users at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.15: Distribution of the number of tweets posted by users at the time of posting their tweet (2020).

Favourites to tweets

In this section we look closer at the relation between tweeting and interaction of tweets with the ratio between tweets favourited and tweeted by users at the time of posting their tweet. In Figures 4.16a and 4.17a we can see a difference in that users that posted a Bitly link in

(32)

4.3. User statistics

general tweet more than they retweet other tweets, the trend is though larger in 2019. Figures 4.16b and 4.17b shows the same thing.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.16: Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.17: Ratio between tweets favourited and tweeted by users at the time of posting their tweet (2020).

Followers

Another interesting user statistic is to analyze how many followers the user had at the time of posting their tweets, the results are shown in the graphs below. A follower is someone that follows the user. In Figure 4.19b we can see that in 2020 results all categories has almost the same probability to have many followers, for the results in 2019 (Figure 4.18b) users that tweeted Bitly links has a higher probability to have more followers than the users of the other categories. This conclusion can also be seen in the same Figures to the right (Figures 4.18b and 4.19b), for both years are tweets containing Bitly links a higher probability to have more followers. Though note that users that had the most followers did not post tweets containing tweets.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.18: Distribution of the number of followers for users at the time of posting their tweet (2019).

(33)

4.3. User statistics

(a) CDF (2020). (b) CCDF (2020).

Figure 4.19: Distribution of the number of followers for users at the time of posting their tweet (2020).

Friends

A friend is an account that a user follows on Twitter. The tables below show the distribution of the number of friends for users at the time of posting their tweet. The results show the same similarity as for followers above, that for 2019 there is a little more noticeable difference between the different categories than for 2020 and that for both years the accounts that have most friends are those that tweet without links.

(a) CDF (2019). (b) CCDF (2019).

Figure 4.20: Distribution of the number of friends for users at the time of posting their tweet (2019).

(a) CDF (2020). (b) CCDF (2020).

Figure 4.21: Distribution of the number of friends for users at the time of posting their tweet (2020).

Followers vs friends

These scatter plots below in Figure 4.22 shows a followers-to-friends ratio or the so-called golden ratio for users at the time of posting their tweet. Every Twitter account can follow up to 5,000 accounts. Once you reach that number, you may need to wait until your account has more followers before you can follow additional accounts [26]. This friend gate are illustrated in the figures below (red-line) and with a equality-rate (green-line). The results from both

(34)

4.4. Bitly link interaction

year are similar with a trend that around equal following but with some more users with more followers than friends. However, what we can see here as shown earlier, is a downward trend for the use of Bitly links.

(a) 2019. (b) 2020.

Figure 4.22: Followers-to-friends ratio for users at the time of posting their tweet.

Unique and verified users

The table below shows how many unique and verified users there where in the data sets from 2019 and 2020.

Category Unique Users 2019 Unique Users 2020 Verified 2019 Verified 2020 All Tweets 12,253,599 15,247,424 53,326 (0.44%) 61,614 (0.4%)

Link Tweets 2,905,502 2,828,482 28,736 (0.99%) 32,576 (1.15%)

Shortener Tweets 245,984 267,963 2,859 (1.16%) 2,373 (4,261%)

Bitly Tweets 112,682 46,307 1,856 (1.65%) 775 (1,303%)

Table 4.5: Amount of unique users for each category and how many of those users that are verified.

From the results in regards to percentage of verified users in Table 4.5 we can see that for All Tweets and Link Tweets there is just a small difference between the years. When init comes to Shortener Tweets and Bitly Tweets we can see clearly the decrease in the Bitly link usage over the past year.

4.4

Bitly link interaction

This section shows the number of clicks for a Bitly link found in tweets that relates to the number of retweets for the same tweet. Figures 4.23 shows two scatter plots of Bitly clicks-to-retweets-ratio and figures 4.24 the logarithmic average of Bitly clicks per retweet count for all Bitly links. In both Figures 4.23 and 4.24 we can see a rather compact cluster right below the black "Equal ratio" line, for Figures 4.23 that tells us that a lot of tweets have more retweets than clicks on the embedded Bitly link and for Figures 4.24 that tweets with fewer retweets than 30 tend to have a higher clicks-to-retweets. If we look at the difference between the years in both Figures 4.23 and 4.24 we again can see the same pattern as shown before, that there is a big reduction in Bitly tweets overall.

(35)

4.5. Verified vs non-verified users

(a) 2019. (b) 2020.

Figure 4.23: Two scatter plots of Bitly clicks-to-retweets-ratio.

(a) 2019. (b) 2020.

Figure 4.24: Logarithmic average of Bitly clicks per retweet.

4.5

Verified vs non-verified users

This section will see if there are any noticeable differences between verified and non-verified users.

Bitly Clicks

The figures below shows the number of clicks to posts made by users with different number of followers, this was done to how many followers a user has in relation to how many clicks the user gets. For verified users that are shown in Figure 4.25 the clicks-to-followers ratio for Bitly links are very similar, though might a bit more clicks for users with a smaller amount of followers in 2019.

(36)

4.5. Verified vs non-verified users

(a) 2019. (b) 2020.

Figure 4.25: Clicks-to-followers ratio for Bitly links for verified users.

Figure 4.26 below tells the same story but now instead for non-verified users. In 2019 we can see that non-verified users tend to get up to 10,000 clicks even if they do not even have more than 100 followers. Figure 4.26 show that the results is a bit different in 2020 compared to 2019, even here, those who have few followers can get many clicks, even more clicks than was shown the year before. But the scale is only a fraction of 2019, so the clicks are more but to a much lesser extent.

(a) 2019. (b) 2020.

Figure 4.26: Clicks-to-followers ratio for Bitly links for non-verified users.

Followers, number of tweets and retweets

To compare different user statistics between non-verified and verified users we will show three pairs of heat-maps. The first one is retweet versus tweets tweeted, second is retweets versus followers tweeted and the last one is followers versus number of tweets tweeted.

Retweet versus Followers Tweeted

The heat-map in Figures 4.27 and 4.28 present a difference between non-verified and verified users for both years an indication of a vertical streak of higher density for non-verified users while there is a more of a blob formation for verified users. This indicates that for non-verified that there are many accounts around the same number of followers that have received very different amounts of retweets. For verified users indicates that many accounts that have the same amount of followers also received the same amount of retweets. This pattern can be seen for both 2019 and 2020.

(37)

4.5. Verified vs non-verified users

(a) Non-verified users (2019). (b) Verified users (2019).

Figure 4.27: Heat-map of retweets vs followers tweeted (2019).

(a) Non-verified users (2020). (b) Verified users (2020).

Figure 4.28: Heat-map of retweets vs followers tweeted (2020).

Retweet versus tweets tweeted

Figures 4.29 and 4.30 above shows the difference between retweets verse the number of tweets for non-verified and verified users. Both years indicate that the relation for retweets and number of tweets looks somewhat the same as the relation between retweets and followers.

(a) Non-verified users (2019). (b) Verified users (2019).

(38)

4.5. Verified vs non-verified users

(a) Non-verified users (2020). (b) Verified users (2020).

Figure 4.30: Heat-map of retweets vs number of tweets tweeted (2020).

Followers versus number of tweets tweeted

Figures 4.31 and 4.32 displays the number of tweets that we observed on the total number of tweet (over the lifetime) of the user making that post and the number of followers that user has (at the time the tweet was made). Verified users tend to have more followers than normal tweets, the result are almost linear but seem to have more tweets than followers.

(a) Non-verified users (2019). (b) Verified users (2019).

Figure 4.31: Heat-map of followers vs number of tweets tweeted (2019).

(a) Non-verified users (2020). (b) Verified users (2020).

(39)

4.6. Covid-19 analysis

4.6

Covid-19 analysis

This section displays the results we got searching for tweets with URL shorteners containing and/or hashtags with any of the specific Covid-19 related words (listed in the method). We will present data from all three collections made in 2020 (18-25/3, 1-8/4 and 18-25/4). We can see that there was a lot more Covid-19 related tweets that we collected in the first occasion. Between 18-25/4 was 4.7% of all link tweets Covid-19, for 1-8/4 and 18-25/5 was the same proportion 3.5% respective 2.5%.

Category 18-25/3 1-8/4 18-25/4

Link Tweets 3,786,543 (100%) 3,788,332 (100%) 3,803,233 (100%) Link Tweets with Covid-19 179,279 (4.7%) 132,056 (3.5%) 96,663 (2.5%) Link Tweets without Covid-19 3,607,264 (95.3%) 3,656,276 (96.5%) 3,706,570 (97.5%)

Table 4.6: Amount of links tweets related to Covid-19.

Bitly make up approximately 1.6% of all link tweets in the respective collection. The pro-portion of links that belonged to Bitly with URL or/and hashtags with Covid-19 was 12.4% for 18-25/3, 8.5% for 1-8/4 and 5.7% for 18-25/4. In both table 4.6 and 4.7 we can see that dur-ing the first period most Covid-19 related links occurred that we collected with our method.

Category 18-25/3 1-8/4 18-25/4

Bitly Links 65,973 (100%) 61,153 (100%) 52,517 (100%) Bitly Links with Covid-19 8,192 (12.4%) 5,224 (8.5%) 2,970 (5.7%) Bitly Links without Covid-19 57,781 (87.6%) 55,929 (91.5%) 49,547 (94.3%)

Table 4.7: Amount of Bitly links tweets related to Covid-19.

The scatter plots below shows the ratio between clicks and retweets for all 3 collections made during 2020. Figures 4.33 shows the clicks-to-retweets-ratio for all the Covid-19 related tweets and Figures 4.34 shows the same thing but in a more zoomed out perspective to show that there are some single tweets that have received several thousand more retweets than the majority of all tweets. Interestingly this is recurring for all 3 collection periods. We can also tell that users tend to click and retweets Covid-19 related tweets as much. In Figure 4.35 we have created a CDF graph for the ratio between clicks and retweets for every collection made in 2020. Here we can see that Covid-19 related links and/or hashtags have a higher probability of a higher ratio than the non Covid-19 related links and hashtags.

(40)

4.6. Covid-19 analysis

(a) 18-25/3. (b) 1-8/4.

(c) 18-25/4.

Figure 4.33: Scatter plots for all 3 collections 2020 of Covid-19 clicks-to-retweets-ratio.

(a) 18-25/3. (b) 1-8/4.

(c) 18-25/4.

(41)

4.6. Covid-19 analysis

(a) Non Covid-19 related links or hashtags. (b) Covid-19 related links or hashtags.

Figure 4.35: CDFs of the ratio between clicks and retweets for tweets containing Covid-19 related links or hashtags and non Covid-19 related links and hashtags.

References

Related documents

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än