The development of a sports statistics web application: Sports Analytics and Data Models for a sports data web application

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Information Technology

2017 | LIU-IDA/LITH-EX-A--17/030--SE

The development of a sports

statistics web application

–

Sports Analytics and Data Models for a sports data web

application

Utvecklande av webbapplikation för att hantera sportstatistik

Andreas Alvarsson

Supervisor : Valentina Ivanova Examiner : Patrick Lambrix

(2)

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Abstract

Sports and technology have always co-operated to bring better and more specific sports statistics. The collection of sports game data as well as the ability to generate valuable sports statistics of it is growing. This thesis investigates the development of a sports statis-tics application that should be able to collect sports game data, structure the data according to suitable data models and show statistics in a proper way.

The application was set to be a web application that was developed using modern web technologies. This purpose led to a comparison of different software stack solutions and web frameworks. A theoretical study of sports analytics was also conducted, which gave a foundation for how sports data could be stored and how valuable sports statistics could be generated.

The resulting design of the prototype for the sports statistics application was evaluated. Interviews with persons working in sports contexts evaluated the prototype to be both user-friendly, functional and fulfilling the purpose to generate valuable statistics during sport games.

(4)

(5)

Acknowledgments

To Magdalena and my family. Always.

(6)

Abstract iii

Acknowledgments v

Contents vi

List of Figures vii

1 Introduction 1 1.1 Motivation . . . 2 1.2 Aim . . . 2 1.3 Research Questions . . . 3 1.4 Delimitations . . . 3 2 Theory 5 2.1 Sports Analytics . . . 5 2.2 Data Modeling . . . 12 2.3 Web Development . . . 15 2.4 Prototype Designing . . . 25 3 Method 27 3.1 Theoretical Study . . . 27 3.2 Implementation . . . 31 3.3 Evaluation . . . 33 4 Results 35 4.1 Data Models . . . 35

4.2 Comparison of Web Technologies . . . 39

4.3 Design & Architecture . . . 41

4.4 Design of Prototype . . . 41

4.5 Prototype Design Evaluation . . . 43

5 Discussion 49 5.1 Results . . . 49 5.2 Method . . . 52 5.3 Further Work . . . 53 6 Conclusion 55 Bibliography 57

(7)

List of Figures

2.1 Definitions of different Data Concepts. . . 6

2.2 The Data Mining Process. . . 7

2.3 The Knowledge Discovery from Databases Process. . . 7

2.4 A Clustering of data via a Data Mining method. . . 8

2.5 A Linear Regression via a Data Mining method. . . 8

2.6 Example of an ER-diagram. . . 12

2.7 Example of a class according to UML. . . 12

2.8 Example of calculating a Mean value. . . 13

2.9 Example of Median. Median is 5 in this example. . . 13

2.10 Example of intervals over Ice Hockey Players’ minutes on ice during a game. . . . 13

2.11 Example of a Box plot. . . 14

2.12 Layered Architecture. . . 15

2.13 Event-Driven Architecture with a Central Organizer. . . 16

2.14 Micro-kernel Architecture with plug-ins. . . 16

2.15 Micro-Service Architecture. . . 17

2.16 Space-Based Architecture. . . 18

2.17 Event Handler according to Node.js. . . 20

2.18 Example of Authors stored as tables in MySQL. . . 21

2.19 Example of Books stored as tables in MySQL. . . 21

2.20 Example of an Ice Hockey Team stored in MongoDB. . . 22

2.21 Example of Vue code with the output ’This is a testing text’. . . 24

3.1 Methodology for finding relevant literature. . . 29

4.1 Data Model for the Statistics System. . . 37

4.2 Ice Hockey Rink divided into position zones for certain events . . . 38

4.3 Comparison between Backend Frameworks. . . 40

4.4 Comparison between Frontend Frameworks. . . 40

4.5 Design & Architecture for the Statistics System. . . 41

4.6 Flow chart for Match Reporting application. . . 45

4.7 Prototype: Concept Design. . . 46

(8)

(9)

1 Introduction

Sports and technology are related to each other in many ways. During history technology has benefitted sports equipment to constantly evolve to let athletes achieve better results, increase the functionality and push the competition even further [1]. It has not only improved the actual exercise with better and more advanced tools, but it has also created the possibility to give better measurements and more precise feedback from performances.

The development of information technology has made progress and some major benefits regarding the collecting of sports statistics have been improved. Sport statistics is a big asset, and has been helpful for a long time [2], not only to get a resume of a sports game or a season in terms of points or goals. It could also be of use to help sports clubs or individual athletes to ease decisions at many different levels, e.g. the need to draft a fast soccer player to the club or identify the area on the basketball field where a team scores the majority of their points. With modern information technology a comprehensive amount of statistics could be collected. Big amounts of data could be the foundation for analyzing big organizational issues or to the level of detail of certain match situations to see what needs to be changed to achieve the best result. It could also bring aspects of the performance that earlier were unknown, or more complex analysis of the opponents. Better and more detailed sports statistics is in other words something that could be of use for many sport oriented organizations and situations.

The collecting of data during sports events is a crucial part to provide valuable data. When tackling the problem of collecting sports game data many techniques could be used with the importance of reliable methods. One approach is to attach GPS devices on players to gather their position and other info. Many different big sport leagues use this technique, e.g. in the NBA (National Basket Association) where they have attached GPS trackers to the basketball players to track the players’ movement and shooting [3]. Premier League, the highest division of soccer in England, also uses GPS techniques [4] and even lower divisions in rugby, such as the Rugby Union in England, have tried using GPS [5].

Another common approach is to use an application where you manually put data into the system. In SHL (Swedish Hockey League) an application provided by Statnet [6] is used during matches, where statisticians reports the data to the system during the match [7]. SSL (Svenska Superligan), which is the best league of floorball in Sweden, does not use any appli-cation at all other than pen and paper during matches and then statisticians manually enter it after the matches. So depending on the approach of how to collect data different kind of techniques are used.

(10)

When the data is collected there are different applications used by the coaches and players to get an in-depth analysis of all events that occurred in the game. In the NBA example the application connected to the GPS devices provides a heat map of each player’s shooting efficiency depending on their position of the field [3]. The presentation of the data is of great importance to make it valuable for the coaches, but the data needs to be treated in ways to e.g. help the coaches make valuable decisions. Different kinds of data analytics and data models are of use and will assist in these situations.

These kinds of data collecting need to be correctly assessed by the business, so the statis-tics is treated professionally and secured. Sports Editing Sweden AB [8] is a company that rec-ognizes the importance of sport statistics. The company focuses on providing services and technical solutions for sports leagues and their fans. They are collaborating with some of the biggest sports leagues in Sweden, such as SHL and SSL. Sports Editing is a company that em-braces technology, web development and sports with the goal to help the sports leagues with their communication and marketing. They are also collaborating with the leagues’ sports statistics providers to present sports statistics data on websites and other platforms. When it comes to the collecting of data Sports Editing collaborates with external actors, and focus is on data treatment and representation. Sports Editing also provides web editors and sports journalists that are providing news coverage to the sites offered, to give a complete product for sport organizations.

1.1 Motivation

Many of the biggest sports leagues have invested in advanced and expensive statistics re-port systems. Some sre-ports leagues do not have or cannot afford a dedicated sre-ports statistics provider, which is why Sports Editing wants to enhance their range of services through a development of their own statistics program that could fill the basic needs of sports game re-ports. This would make it possible for a smaller league to collect statistics and make it easier for them to later switch to a dedicated statistics provider, such as Statnet.

This master thesis treats the development of this application and it will function as a complement to the highly advanced ones offered by dedicated providers. Sports Editing has an interest in developing a statistical system which should be able to handle inputs from sports game reporters in a intuitive and efficient way for several different sports.

1.2 Aim

The application is stated to be a web application, but the intended format is not set. The new statistics system should enable export of many different advanced data and statistics. Different kinds of Data Analytics will be needed to provide relevant statistics, and a good graphical representation is sought to bring a more user-friendly way to draw conclusions from the data.

The new statistics program requires development on both back-end and front-end. On the server-side smart ways of handling the statistic input to the system is needed, and also process and calculate the data to provide other relevant statistics. Since this is a web appli-cation, many different back-end frameworks and database technologies could be used, and a comparison and evaluation of what proper choices of technologies needs to be performed. A general theory evaluation of sports analytics will be conducted and some research about data visualizations will be conducted.

Regarding the front-end, the goal is to develop a web application prototype that should communicate with the statistic system. It should be used both as a match report tool and to show statistics from past matches and visualize it in proper way. The match report tool is intended to be used by journalists and statisticians to report game events through interaction

(11)

1.3. Research Questions

with the application in a graphical and intuitive way. The match report tool will be evaluated to some extent and hopefully lead to easier and more effective collection of sports statistics.

1.3 Research Questions

• How can a web application be developed according to modern web development frameworks to ease the collecting of sports data during matches?

• How can sports game data through data models and data analytics be treated to make valuable sports statistics and show them in a proper way?

1.4 Delimitations

There will be some delimitations regarding the web application prototype that will be devel-oped. Sports Editing are to a large extent using a Linux environment and Open Source ap-plications and frameworks for their development. This excludes some quite popular frame-works, such as Microsofts .NET-framework, that do not fit into this software stack. Sports Editing has already established a specific design and architecture and according to the choice of technology a development of the statistics back-end will be conducted and synced with the system architecture, deployment and applications of the company.

Some database technologies will also be reviewed to fit the need of the application, but not as in-depth as a full coverage report, and it will focus on more established and frequently used databases technologies. The same standard will be present regarding the look and graphical design of the prototype, since the main purpose is not getting a final version, but some design thoughts will be discussed. The prototype should function as a way to test the statistic system and help to evaluate if the statistic system is something that eases the collection of data during matches.

The sports that the system will treat is delimited to team sports, focusing on the major sports that are of interest to Sports Editing. Their main domain knowledge is within ice hockey and floorball, since they have established business contacts with such leagues.

There will be an implementation of the intended statistical program according to one soft-ware stack and architectural design. The theory and pre-evaluation of the frameworks will lead to a choice of techniques and one implementation of the program will be realized. Multi-ple versions will not be developed according to different software stacks and the comparison will not treat implemented systems. There will simply be a theoretical comparison and not a practical one.

Another delimitation to the statistic system is that the statistical models and mathemati-cal analytics will be performed at a basic statistimathemati-cal level. The purpose of the program is not to draw big conclusions regarding advanced mathematical models to e.g. calculate the next purchase of player or foresee events in the next game. The program will have the approach of collecting, calculating and presenting the sports statistics in a proper way that helps the involved stakeholders to get knowledge or draw conclusions from the game or season. Ana-lytics will be implemented to some extent, but at a basic level.

The system will generate some amount of data, but not at the rate of the definition of Big data. However, this will still be treated in the theory, since many of the bigger sports leagues are generating data at a rate that could be considered to be defined as Big data. Therefore it is of great relevance to bring the subject up when conducting a literature overview in the area of sports analytics. This could also be relevant for future work of the statistics program.

(12)

(13)

2 Theory

The development of a sports statistic program corresponds to different aspects, both theoret-ical and techntheoret-ical. The following sections provide the theorettheoret-ical foundation for the thesis, in the form of the different sections regarding sports analytics and web development.

In order to gather data, model it according to different data models and visualize it in an understandable way the literature study needs to be diverse and extensive. Sports Analyt-ics is treated along with an introduction to what general data analytAnalyt-ics means according to definitions of data science. The data handling in combination of development technologies needs to be theoretically attested along with a brief study of data modeling and mathematical methods.

Since web development is constantly evolving, a proper introduction to some modern frameworks are theoretically introduced. Different components of a system are also described at a more detailed level, such as database technologies, frontend and backend technologies. There is also a section of a more architectural character handling the overview of a systems structure according to a set of architectures and designs. Sections regarding introductions to different software stacks and web technologies that are up-to-date are also included in the theory. The visualization of data will also briefly be covered theoretically with guidelines and paradigms to show data in a proper way.

2.1 Sports Analytics

The need for statistics to make a correct analysis of a sports game is crucial [2]. Since sports analytics has the concept of using sports data to create valuable statistics for analysis the need for proper data models is present. There are different approaches on how to treat sports data in combination with data analytics to create statistics and other beneficial information. Sports analytics is also in need of getting correct data from matches to analyze. To collect match data, different kind of technologies are applied to provide large amounts of data, many in which a lot of detailed statistics could be constructed. The big collection of sports data then benefits the analysis and decision making from the sports games [9].

Sports Analytics is the concept of treating sports data through analytic methodologies to help make valuable conclusions [10]. Analytics is applied to conclude advantages in the exercising of sports. The conclusions need to be originated from established data analytics, according to mathematical models that the sports industry has evaluated and are used in

(14)

some manner. To be able to perform such an analysis the definition of data analytics needs to be clarified.

Data in the context of sports analytics should not be mixed with statistics. Data is the untreated raw element with information that should be the foundation for later analytics and not a direct representation of sport statistics. Since there are a lot of data terms some definitions need to be presented. Some terms often mentioned in the context of analytics, depending on each other, are Data Mining, Data Analytics and Data Analysis that all are part of the research field of Data Science (see Figure 2.1).

Figure 2.1: Definitions of different Data Concepts.

onthe.io - What is the difference between Data Science, Data Analysis [...] [11]

2.1.1 Data Mining

Data Mining should not be mistaken for the simple process of collecting raw data. The con-cept should rather be seen as the collecting of data and knowledge building through the technique of finding patterns of the data collected [12]. The whole Data Mining phase could also be described as the stage where you try to find relationships in the data to outline the data in a more understandable way [13]. A process called The Data Mining Process describes Data Mining in a more detailed way and divides the process into the three steps Data Collect-ing, Feature extraction and data cleaning and Analytical processing and algorithms [14] (see Figure 2.2).

The first step is basically to collect data in some form. This could be everything from logging the user interactions of a system to send data from hardware-based sensors. Since it is dependent on the context of what kind of data that is collected, the methods of collecting data varies and are applied on an application-level. The second step where the extraction and data cleaning is activated the data is marked with meta-data through different techniques to be classified to some extent [14]. There are no industrial or theoretical standards on how this is executed, but the aim is to supply an understandable structure of the data set. The cleaning part is mainly to correct missteps in the data, according to what is relevant, and the data is transformed to the wanted format. The third and last phase is where the analytical part starts off and some kind of method is applied on the data set. The algorithms used should supply understandable and readable output to enable analysts to do their job.

(15)

2.1. Sports Analytics

This process corresponds to a set of methods that are of a similar character. A similar, but somehow more basic, approach to the mentioned data mining methods are the one according to Blom (2005). He states that a statistical investigation is somehow a phase of four steps which are Planning, Data Collecting, Data Processing and Presentation [15]. The different steps are mentioned in a wider sense but could be relatable to data mining. Since this is a main factor when conducting statistical research, the need for good processing is present. However, this corresponds to statistical theories, which is of great use for data modeling.

Figure 2.2: The Data Mining Process. Data Mining: The Textbook (2015) [14]

Data mining, as stated through the Data Mining Model, could be enough for many con-texts where data needs to be analyzed, but when the data amount increases some kind of Data Analytics could be of use. Worth mentioning is that there are many different definitions of where the process for Data Mining stop and Data Analytics begins, since the third step of The Data Mining Model could be compared to Data Analytics. Both steps have the purpose to apply an algorithm that should provide facts. The concepts overlap each other since some kind of analytics is being done to some extent. Data Mining is also defined as one of the steps in the so called KDD Process [16].

Another approach of getting analytical output is the KDD Process. This process is an abbreviation for Knowledge Discovery from Databases, and is mainly a method founded in the nineties to go from more manual approaches of creating knowledge from a set of data [17] (see Figure 2.3). Earlier in the process the steps are there to refine the data to achieve patterns in a widely way.

Figure 2.3: The Knowledge Discovery from Databases Process. Fayyad et al. (1996) [17]

The process starts off by setting the data that should be analyzed and a subset of the data is selected. Some kind of pre-processing follows and cleans the data together with identi-fications of missing properties. Later on some of the data is reduced to be more efficient according to the purpose. The data is also transformed to a desired format, e.g. to different frames or classes. When the data mining methods are being executed some kind of patterns

(16)

are distinguished. A pattern, in the context of data mining, is to classify the cleaned data ac-cording to some specific method, to map the data for specific purposes. One of the commonly used classification methods are to cluster the data where they correlate to each other (see Fig-ure 2.4). Different clusters could be of use for different questions in need of answers and it gives a structure of overlapping conclusions to the extract from the data. There are more data mining methods available that are accepted for the KDD Process, such as regression and summarization. Both a linear and a non-linear version of the regression could be used, where the linear one simply creates a learning linear function that represents the correlations (see Figure 2.5). A summarization is instead a function that tries to mark subsets of data with a certain characterization. [17] [18].

The KDD process reminds of later definitions of The Data Mining Process. However, there is also research that states that Data Mining could be seen as an individual method and not necessarily related to the KDD Process anymore [16].

Figure 2.4: A Clustering of data via a Data Mining method. Fayyad et al. (1996) [17]

Figure 2.5: A Linear Regression via a Data Mining method. Fayyad et al. (1996) [17]

(17)

2.1.2 Data Analytics

The definition of Data Analytics is quite linear, but varies to some extent according to data science. According to most of Data Science the term Data Analytics could be seen as some kind of software application used to automate the desired intuitions to the set of data to find facts from the data according to specific questions [11] [12]. It derives from research fields such as statistics, machine learning and artificial intelligence, and has the purpose of discov-ering dependencies of what the system gives from various input parameters [12] [14]. Some techniques from other concepts such as data mining models could be used, but also algo-rithms for real-time data observations could lead to eventual a successful analytics. Another basic definition of what Data Analytics is that it is an automated algorithmic process to find certain specific correlations in data [11]. It could also be seen as the tool of finding valuable trends hidden for the overall analysis, not dependent on earlier hypotheses. The last step of The Data Mining Model could be compared to what it means by Data Analytics since both should find patterns in data. Many techniques could be used to find patterns, such as the so called Exploratory Data Analytics which is used to capture the relationships available in the data set that were not known or vaguely formulated [19].

In shorter terms, a distinction of Data Analytics and Data Mining could be that if you know a question you want to have an answer to or a logic that is supposed to be tested, you should perform Data Analytics. When you do not know much about the data and have no specific question in need of answering you should start off with Data Mining.

Data Analytics, and Data Mining, are commonly occurring in many research areas such as Social Media and Business. In social media a lot of data needs to be structured and users are related to each other in various ways [20]. All relationships in social media are complex and interactions are constantly being made. Data Analytics could help to answer e.g. how networks evolve over time and is also commonly occurring in business contexts to ease the investments of a company. It is often referred to as Business Analytics and has the purpose to detect changes in the industry that a company should respond to in the best ways possible. The Data Mining Process is also used in business contexts, such as to be the foundation of measuring the outcome of a specific action [21].

Thanks to the power of computer science, Data Analytics could combine all of these re-search areas to show knowledge that is hidden in the big mass of data. Some programming languages are more suitable for performing relevant and efficient analytics. The statistical language R is especially suitable for heavy statistical calculations [22]. MATLAB is also an advanced software, dedicated to math and statistical calculation among other areas [23]. Re-garding other languages that imply one of these mentioned dedicated languages Python is mentioned as a powerful language to create a valuable data analysis, able to import R li-braries for calculation [24].

2.1.3 Data Analysis

Data Analysis, not be confused with Data Analytics, is where the human interaction with the structured data starts. The prior steps should lead to a readable basis which supports decision-making. In short terms the human interaction with the analytics is made to draw some conclusions of the data being studied. This helps to constantly evolve not only the context on which it is supposed to support but also the bigger picture of what Data Science could help achieving.

The data being analyzed could be derived into qualitative and quantitative data analysis. Quantitative data analysis is being made on more numeric or graded data set to specific models. Even if this is the human interaction of the data some computer tools used to analyze could be efficient [25]. A more qualitative set of information should hopefully be analysed to generate new hypotheses that could be verified further [26].

(18)

2.1.4 Big Data

One buzzword in the area of data science is Big Data. Since the data rate has gone up in the latest years the need for efficient big data analytics has become more and more important. The increasing smart devices being carried have made data rate explode, and with the increasing sensors and interactions in society some smart solutions need to be carried [27]. Not only the increasing devices has made an impact, but also the behavior of the users. A technology such as positioning generates a boosted amount of data, and there are many areas such as Business Data, Image Data and Industrial process data [12]. User behaviour generates so much more data, that according to some foresights is expected to exceed 240 exabytes per day by 2020 [28]. Clearly there is a need for approaches that can handle these big amounts of data, since ordinary data treatment would not be sufficient.

When conducting Sports analytics there are many different perspectives to use as starting points. One important thing is to identify the data to be treated. Many applications related to sports generate large amounts of data, bringing a need for structured design. Big data has the possibility of making any system smarter and the term Smart data may better describe the potential of how a business could be improved [9]. To structure Big data to Smart data the so called SMART Model is introduced. SMART stands for:

• S: Strategy

• M: Measure Metrics and Data • A: Apply Analytics

• R: Report Results • T: Transform Models

At first a clearly defined strategy needs to be stated to make a distinction on what in-tention is wanted with the data. Time should be spent on being accurate while defining the strategy. The second phase is instead the validation of data where the collection and identi-fying the sources of data is present. It is of great importance to know the assessment of the data, since it tells a lot about the context in which the data is gathered. With that in mind the analytics is applied to extract insights and knowledge from the data, forwarded to the stake-holders of the system in the next phase. Lastly the human analytical phase will be present to identify what parts needs to be transformed and what changes will be made to improve the business. [9].

With this big amount of data it may be impossible to save all data and analyze in an slow pace which could cause some storing problems. The problems of saving data is however negligible if you compare it to the many possibilities that is enabled thanks to the technique, and nonetheless in sports analytics. More detailed conclusions can be made since the amount of data has increased, following that data science and pattern recognition is becoming an even more important research field [29].

2.1.5 Data Science and Sports Analytics

As already mentioned there is a correlation between sports analytics and data science. Data Analytics has already been conducted on many levels in sports e.g. NBA, NFL and MLS [30]. One example is the NBA, providing heat maps for shooting accuracy [3]. It is easy to look for the player who has scored the most points, but harder to measure who really is the best shooter in NBA according to position and situation. It is stated, according to some definitions of what a good shooter is considered to be, that Steve Nash at a certain position is the most effective scorer rather than the one at the top of an ordinary score table. This kind of statistics could be provided with the modern data collecting tools.

(19)

The ice hockey league of NHL, which according to Barnes (2016) [31] is one of the most non-developed sports regarding analytics, has already tried analytics on both a player and team level to analyze what is needed to be improved. This could affect how teams should be coached and what players to look for in the roster. The sports analytics of NHL is evolving and Barnes thinks that it is about time that the league shapes up.

Data Mining is already established in sports and some frameworks have been developed to structure the data. One example of these frameworks is the one according to Data Mining in Elite Sports: A review and a Framework by Ofoghi, Bahadorreza et. al (2016) [32]. It has the pur-pose of giving support for decisions as well as finding correlations of what events in a game had the biggest influence at some extent, where attributes of the data is stated as performance measures. They are defining the Data Mining process according to something called Wisdom Hierarchy, where knowledge is achieved through turning raw data into information through logical and mathematical patterns of the data. When this knowledge is analyzed by a coach, player or any other sports related person it turns into wisdom of the sport.

In the case of sports the analysis should treat crucial attributes that could lead to success for the team, e.g. certain match events or tactics. It divides the range of results into Straight Forward (Simply variables) and Sophisticated (Hidden conclusions) and the data processing into five categories:

• Filtering: Categorizes the data.

• Format Conversion: Converts the data to wished format.

• Extractions: Creates new alternative data from the formatted data (e.g. ranking of times etc).

• Structural Conversion: Converts to measurable data.

• Descriptive Conversion: Converts the results so that it describes and acts more appro-priate.

When these steps are made, some kind of correlation between what events lead to a suc-cess according to the winning criteria, e.g. you win a game of ice hockey if you take the lead at a rate of 57%. Earlier matches are weighted into calculations and the purpose is to find what criteria are the most important and what players are most suitable to meet them [32].

Some sports are harder than others to measure, since different stages of a game could be hard to value the impact of (e.g. Triathlon). Variables such as duration and winning criteria are important to make a good conclusion. In Triathlon the actual time could be of some value, while factors such as how far ahead the athlete was from the other opponents might be more important [32].

The actual performance could also be improved with the use of analytics. Sports Perfor-mance Measurements and Analytics by Lorena Martin (2016) [33] defines analytics as models based on data and particular research questions, since the definitions of what a good per-formance is for individuals and sports. NFL is a good example, where different positions in the team activate different kinds of muscle groups. Different types of variables are physical, psychological and behavioral and everything is measured to identify the individuals perfor-mance potential improvements.

The revolution of Big data has also affected the area of sports analytics. The already men-tioned examples from NHL and NBA are dependent on that the amount of data is so big that it gives the knowledge a empirical foundation. Many big companies have started to see the benefits of combining Sports Analytics and Big Data to make a profit. On the yearly confer-ence of MIT Sloan: Sports Analytics Conferconfer-ence [34] (MIT, Boston) both IBM and Intel have shown their interest where the first focusing on Tennis and the latter focusing on Big Data in Soccer [35]. At SSAC16 a panel debate was present about how big data have changed the way we see sports and that the rise of big data is something that the fans just want more and more of.

(20)

2.2 Data Modeling

When creating a database-driven web application where data needs to be inserted, updated and extracted at ease, some kind of structure according to what the database elements looks like will be could be helpful. The Entity-Relationship (ER) Model is a model that has the purpose of showing the overview of a database system with a high concept model on both entities and the relationships between them [36].

This high concept-modeling starts off with a definition phase, where all of the entities that should exist in the database need to be defined. An entity could be similar to a thing or an object, which represents something that we want to store, e.g. a player or sports game. To each entity so called Attributes are added, which could be seen as properties for the entity. They are added to set what elements the entity has and to represent what the entity is. Some key attributes will be present for each entity and function as the identification of a unique initiation of the entity, and these are underlined in the ER-diagram. After the entities are defined the relationships between them need to be set, and are represented as slant squares connected with lines between entities. It is of great importance that the definition of how many relations a certain unique instantiation of an entity can have to another entity, e.g. a game can only have one season while a season has many games (See Figure 2.6). When all is defined the realization of the database can begin. [36].

Figure 2.6: Example of an ER-diagram.

The Enhanced Entity-Relationship (EER) Model also includes some additional features that are useful. Sub-classes are examples of the enhanced capabilities. Later object-oriented programming implemented similar functionalities, where an object inherits properties of a superclass. This could, for instance, make both entities Floorball and Ice hockey some inher-ited properties from a Sport superclass. [36].

Other tools that could be used to model the conceptual data model is the so called UML (Unified Modeling Language). UML is traditionally a model for object-oriented structural languages, such as Java or C++, to define classes or objects and how they interact with each other in a system. Some modern database technologies have a object-oriented-like structures, e.g. OODBS [37], or relatable structures like document-based databases, e.g. MongoDB, which could use UML to a database modeling as well. The version called UML/P models the different classes, interfaces and other elements existing in a system according to a set for-mula (See Figure 2.7). UML could treat different modeling approaches and functions more as a declarations of what methods and variables that the class contains. Worth mentioning is that there are many variants on how to create a UML models, but this is a quite traditional and common one [38].

(21)

2.2. Data Modeling

2.2.1 Statistical model

When turning data to statistics certain models need to manipulate and calculate the data to turn it into answers that are desired. There are both statistical and probability models available when having data available. Probability models predicting what could happen, while statistics simply organizes what already has happened. Statistics are made through organizing the data, summarizing and later leading to analyzing [39].

Regarding the sport context, there are both basic and advanced statistics that could be of use. Simple sums of games played, point gathered and goals scored often state how a team is performing for a season. There are two types of results that are important summing up sports data and that are means and medians. Means calculates the average value though summing all of the values dividing it with the amount of entries. This gives an average value (over that specific amount of games) that is easy to compare along team mates and opponents players. Median is instead the middle value in the stack of values, providing what is the value in the middle. This provides half of the values under and the other half over the median. Sometimes these simple values tells more of a situation rather than the frequency of an event (see Figure 2.8 and 2.9). [2].

Figure 2.8: Example of calculating a Mean value.

Figure 2.9: Example of Median. Median is 5 in this example.

Data can also be distinguished into certain intervals that are of interest for sports statistics. An example could be the average time on ice for an ice hockey player during a season, where the average time is divided into intervals of e.g. five minutes each. This could give some sense of how many players in a team that are expected to play between 20 and 25 minutes per game, providing info that a quota is filled for twenty minute players in a team or the contrary, which could lead to alternative drafting tactics (see Figure 2.10). Another example could be to see the shooting percentage as histograms in NBA, where different kinds of forward types get different percentages. These examples are kinds of conclusions that could be supported through sports statistics.

(22)

More advanced statistics, such as getting how many of the games end in a win for the first scoring team or what position of the field is best for a certain player, is harder and demands more complicated statistical models. Variances are of use, since sports varies in result and accomplishment even if statistical models have been set of previous records. The so called standard deviation describes how much values differ from the mean value in a set of values [2]. This could be useful that the average scored amount of goals in a game is a certain value but the standard deviation tells the interval of how far from the mean the team is scoring. This enables more advanced conclusions regarding how to compare different aspects of a game within or against other teams. This along with other more advanced statistical conclusions could be of use, but is not in the scope for this thesis.

2.2.2 Data Visualization

When showing sports data in a graphical and visually understandable way the need for using established graphs and diagrams to ease the spectators information gathering. Nowadays it is not just a scoring table that should be present. Real-time updates with data are more and more often occurring in the biggest sports leagues [40]. The media platform could vary from mobile to TV, but there are some established tactics regarding making spectators understand sports data no matter the context.

Histograms and Box-plots are two approaches of showing sports data in proper ways. Histograms describes intervals in a set of data, creating graphs for the amount of entries of a certain type between two values. These Histograms represents what the data between the values approximates helping data sets to be understood (see e.g. Figure 2.10). It has been established with great success in the database world [41].

Box plots is instead a way of summarizing data, using the median to provide information of the data. The middle part of the box is the median and the ends of the box are the value of 25% and 75% of the set of data, making 50% of the values inside the box. The two lines at the end is the min and maximum value. These components constitutes a box plot which helps to provide knowledge from data. [42].

(23)

2.3. Web Development

2.3 Web Development

To develop web applications that should collect and calculate data according to sports ana-lytics a good architecture and design would be preferable, nonetheless a modern and efficient approach. The design of any application collecting data should not be too complicated but still provide the detailed data in a user-friendly way, since the concept of Interaction design is to make functional applications with the user in focus [43]. An application that extract and show sports data in a proper way should have detailed information, since there is diversity in the situations in which the data will be presented. Everything from showing sports data in media, which arguable satisfies the fans of the sport in the first hand with informal statistics, to more business data for organizations [40].

2.3.1 Architecture & Design

There are some patterns recommended using when creating the structure of an application. These should be considered as paradigms developed over time for programmers and soft-ware architects. Some architectures focus on being lightweight with a big flexibility, while other architecture designs focus on providing the best performance as possible. This sec-tion regarding the different architecture approaches is mainly from "Software Architectural Patterns" by Mark Richards (2015) [44], which describes some modern architectural patterns. Worth mentioning is that some of these architectural patterns could be combined or used on different levels of an application.

2.3.1.1 Layered Architecture

One architecture that possesses a level structure is the layered architecture. It divides the different layers of the architecture into an arbitrary amount of layers holding different func-tionality [44]. One of the most common paradigms is to structure it according to four layers with the names Presentation Layer, Business Layer, Persistence Layer & Database Layer (See Figure 2.12).

Figure 2.12: Layered Architecture. Mark Richards (2015) [44]

The Presentation layer is the layer where the user sees and interacts with the application and the primary function to show data. The Business layer communicates with the Persis-tence layer, that directly interacts with the database. The last Database Layer has the purpose to simply store the data. So the lifetime of a HTTP Request will go through the different lay-ers further down to the data and then return to the presentation layer. In the context of a web application three layers are a common way of structure, with frontend as the presentation

(24)

layer, a middle layer in the form of frameworks that ease the interactions with the backend layer (with the database models). [44].

2.3.1.2 Event-Driven Architecture

The Event-Driven Architecture has the approach of handling requests depending on what events are supposed to activate. There are mainly two types of events, one that happens in an organized way with a central organizer and the other one is more of a primitive chain-like event cycle with no coordination. The first one creates an event which the system queues to some kind of organizer. The organizer delegates the events to certain appropriate event channels (see Figure 2.13). These channels activate the needed processes to fulfill the task or reaction to what the event started. With this approach you could achieve high performance from the system providing possibility to let many processes execute at the same time.

Figure 2.13: Event-Driven Architecture with a Central Organizer. Mark Richards (2015) [44]

The second alternative is more of a linear approach where the events go directly to an Event channel which activates the processes needed for the specific task. This could be coor-dinated to some extent when many processes could be activated at the same time according to the channel. [44].

2.3.1.3 Microkernel Architecture (Plug-in Architecture Pattern)

The micro-kernel architecture separates the core functionality from additional functionality in the form of added plug-in components (see Figure 2.14). Some similarities to an operating system could be stated, where you have the basic functionality to interact with the computer and added programs to fulfill certain tasks. [44].

Figure 2.14: Micro-kernel Architecture with plug-ins. Mark Richards (2015) [44]

(25)

2.3.1.4 Microservices Architecture Pattern

An approach of building services separately and not having to build an entire system is the foundation for the Microservices Architecture (see Figure 2.15). Some sort of definition of what a service has the intention to handle needs to be stated at first. The work is then di-vided into certain teams to build or update the independent services that handles a certain functionality and connect them altogether in the end. This creates some good flexibility in the development where certain components could be developed and tested independently, while the performance could lack if some service components are slow in its functionality. This provides a light weight approach to programming and enables some functionality that could work on its own, and could be used in other contexts.

The requests go to some kind of User Interface Layer that treats the request and sends them to the right service. One request could activate many services, which are coordinated at the user interface layer where REST (Representational State Transfer) requests are common in web development. If the services are strictly internal for a system some kind of API (Appli-cation Programming Interface) could be developed, to function as an internal user interface layer. This enables request from external users to get the proper response without really knowing how it was treated internally. Through the API many services could be activated but returning a proper response to the user that made the request. [44].

Figure 2.15: Micro-Service Architecture. Mark Richards (2015) [44]

2.3.1.5 Space-Based Architecture

The Space-Based Architecture differs from other approaches. They divide the different pro-cesses and database models for the application, and treat them with a middleware component instead (see Figure 2.16). So called Processing Units that contain independent functionality which keep in sync using the middleware. Modern examples of this type of architecture are the many cloud-based solutions for applications that exists on the market, where access and changes could be made from various devices.

There is a separation of processing units, treating the different requests and are later synced to the middleware that organizes all the instances. The middleware functions as the communication center, while the work is processed in many different places. The per-formance could be improved through the separation of the database layer to many different places, since the database layer often is the bottleneck of high performance. All requests want to get, add or modify content from the same place, but with the space-based architecture this could be solved in a smarter way. Different parts of the middleware could be divided into Messaging Grid, Data Grid, Processing Grid and Deployment Manager.

(26)

Figure 2.16: Space-Based Architecture. Mark Richards (2015) [44]

The Messaging grid handles all the delegations regarding the request and awakes the Processing units available and activates to fulfill the task. The Data grid handles all of the database related functions, where replication and syncing could be considered as main func-tionality. This enables the space-based solutions to integrate processing units that is included in the process. The processing grid is instead a coordinator if many process units are about to work on the same task, then the multiple processing is executed. The deployment is han-dled by the Deployment managers job, where activating and shutting down unit processes regarding the need. [44].

2.3.2 Software Stacks

In web development there are plenty of combination of techniques regarding programming language, frontend-frameworks and backend-frameworks. So called Software Stacks are com-mon combinations of techniques that are clustered together and co-operate in a good way. Often the operating system, web server and database techniques are mentioned, but also some preferred programming languages and frameworks can occur. Some of the suggested stacks could be altered with different programming languages or database techniques and enable other differing set of techniques.

Different stacks come and go in popularity among developers. Some of the list in this thesis are stacks that have have been used at Sports Editing while some of the stacks are newer and more modern than when the company started up their business.

2.3.2.1 LAMP

The LAMP Stack stands for Linux as the operating system, Apache for web server, MySQL as database technology and PHP as the main language for developing the server backend. This has been a popular choice among developers during the 2000s and has been established among developers [45]. For web development PHP is a robust choice programming the back-end since it interacts with MySQL in a way. MySQL is a relation-driven database where data is stored as tables in database. The data is collected as tables where certain values can be withdrawn through different queries. The relations link tables together making a database full of wished dependencies. Other functions such as data replication are available.

(27)

2.3.2.2 LEMP

LEMP is a variant of LAMP, where many of the components are the same but the web server Apache is replaced with Nginx. Nginx is in some studies being considered to be a fast tech-nique [46]. LEMP has many similarities with LAMP and some would consider it to be more of a question of taste regarding web server technique. Sports Editing is using some kind of modification of LEMP with some additional micro-services added and deployed by Web-Pack, a module bundler. Webpack is not bound to be a core part of the LEMP solution stack but is applicable for the stack.

2.3.2.3 MEAN

The MEAN Stack stands for a newer approach compared to LAMP. MEAN stands for Mon-goDB as the database technology, Express.js as the web application framework, Angular as the frontend framework and Node.js for building the application. Node.js and the web frame-work Express.js are really glued together. MEAN has become a popular frameframe-work, replacing many of the LAMP stacks in the industry [47]. The main advantage for using MEAN is that all of the components in the stack is using JavaScript as a programming language, which could let developers focus on learning one language. It is built on a regular MVC (Model-View-Controller) logic [48]. MEAN also provides included support for Angular.js, a popular choice by many frontend developers developing websites with over 55.000 stars on Github [49]. The framework is described more thorough in the section of Frontend Frameworks.

2.3.2.4 MERN

The MERN stack is a version of the popular MEAN stack where the Angular frontend-framework is replaced by the open-source frontend frontend-framework React. Both Angular and React are two well-established (JavaScript-based) frameworks competing for the attention of web developers. React is developed by Facebook and will be explained in the section of the Frontend Frameworks.

2.3.3 Backend Development

Backend is the underlying platform that an application are communicating with, where the treatment of requests and data handling is present. Since the data is of a sports character, and some backend functions e.g. to calculate statistics, some common backend frameworks are in need of evaluation.

Three frameworks will be introduced, where all of them are using different languages and are at different scales. Express (with Node.js) is a popular JavaScipt based backend working asynchronous. Flask on the other hand is a the lightweight option for enabling Python. Fi-nally Laravel is a PHP-framework that comes with a lot of presets and functions out-of-the-box.

2.3.3.1 Express

Node.js is a web server library for JavaScript which enables asynchronous requests, which let the developers execute tasks at a parallel manner and is more of an event-oriented language. The often ordinary approach of letting so called threads execute the flow of code in chron-ically order has been scrapped in favor of asynchronous programming. This enables many tasks being performed at the same time, and avoiding the lock problems that threading could include implicate.

Node.js embraces the event-driven style of executing certain code if one particular event is triggered (see Figure 2.17). This enables many events to be performed simultaneously and the loop exits when no more events is left. Ordinary JavaScript is being executed in the same way. [50].

(28)

Figure 2.17: Event Handler according to Node.js.

Express is a common backend framework for using the Node.js technique and provides a template for the structure of the web server. This generates code and folder structure for a minimalistic web application suitable for getting started with development. This also enables some features of use, e.g. extending response objects, to fit different output criteria, or a fully functional routing system, where you e.g. manage the different url for the server [48]. This is still a minimalistic approach where you add the necessities for certain contexts.

2.3.3.2 Flask

Flask is a light-weight Python-framework for enabling web server functionality. Since it is lightweight you would have to create all of the functions from scratch and no feature comes for free in native Flask, and could be stored in one specific .py-file. This provides control over the data treatment, but could result in more working hours developing something useful. [51]. Performance, concurrency and flexibility is properties considered to be good in the Flask framework [52].

2.3.3.3 Laravel

Laravel is a detailed and (in many aspects) complete backend framework written in PHP. Laravel is built out of the PHP-based Symfony which is a PHP framework enabling web functionality [53]. Laravel has the philosophy of providing a tool with a lot of function-ality out-of-the-box. The latest stable version (Laravel 5.4) comes with integrated support for the frontend framework Vue.js and Bootstrap, and other suitable tools such as Sass and Webpack (See section Frontend Frameworks later in thesis) [54]. Development can be started directly after installation, and with the specific Laravel commands enable it facilitates the de-velopment, e.g. creating models or controllers with a simple command. An authentication service could be enabled, with login screen and database mitigations, just with another help-ful command. This infrastructure saves the time for the developer making time for creating the desired application.

Since PHP is an established way of setting up web projects, and Laravel has the acclaimed Model-View-Controller as role model in its structure, there is not that many surprises for a web developer. Routing settings, where you decide the routes and functions for url, are set with ease where you can enable wanted middle-ware to access them, e.g. authentication to-kens. Other PHP functions such as name spacing and Laravels own query builder, which makes interaction with the database easier, is also included. Laravel has the ability to choose which database technique that should be activated, since MySQL, SQLite and other database techniques are included from installation. Later versions also comes with Vue.js as the pri-mary frontend tool included in the installation, where Vue components that includes code

(29)

that can access and update data from database. This approach of enabling many options and include third-party support is suitable for prototype development. [55].

2.3.4 Database Technologies

When treating statistics we really need a proper database technology to treat everything cor-rectly. There are numerous approaches of storing data in databases, where SQL has become a leading platform for relational databases [36]. SQL comes in many versions but the focus in this thesis will be on open-source options for the mentioned technique, including MySQL and SQLite. An antipole for the established SQL is the NoSQL, that delivers more freedom and fewer rules to follow, e.g. no referring to other key values. MongoDB is the one treated in this thesis because of its popularity and that Sports Editing are using this version of NoSQL.

2.3.4.1 MySQL

MySQL (My Structured Query Language) is a so called relational database, with data store in tables. Different columns represents different attributes and one row in the table represents an entry with the data. So called RDBMS (Relational Database Management Systems) involves the concept of letting different tables connect with each other, making relations between them. This provides a paradigm in database treatment, where data is stored as tables and accessed through so called queries. A query is like a question to the database, where you receive a resulting table depending what is being asked for. An example could be that a table for an author can have relations with tables that represent books, as long as the books are marked with an authorID in the table for books (see Figure 2.18 & 2.19). [36].

Figure 2.18: Example of Authors stored as tables in MySQL.

Figure 2.19: Example of Books stored as tables in MySQL.

Primary key is a concept where one of the attributes assigned for a table is set as the identifier for a specific entry. In the example from Figure 2.18 the authors’ primary key is the authorID, which define just that specific author. The concept of foreign key is instead how to connect tables and relate to each other, where you can refer to another entry in another table listing their primary key. In the example of Figure 2.19 we list authorID as the reference (foreign key) to the table of authors, linking the books with the correct authors. Queries could be used to ask the database for e.g. all of the books by a certain author by its authorID.

This way of structuring data enables other features in MySQL, such as JOINS, where you could ask the database to receive answers depending on multiple conditions, or more primi-tive features such as COUNT, where you simply get a result of an amount of entries for that specific query. This, along with other features, makes MySQL an accessible database structure of withdrawing desired data as long as the relational rules are being followed.

(30)

2.3.4.2 SQLite3

SQLite3 is a really light-weight version of SQL that treats the data handling in a similar way. Everything is stored in a single file for keeping the storage low but the setup easy. This is not recommended for upscaling, since every user would interact with the database-file at the same time, even though it becomes faster if you give it more disk space. All of the functionalities of an ordinary SQL is present, and could be enabled with such a compact file, making it good for prototyping and smaller projects. [56].

2.3.4.3 MongoDB

NoSQL databases do not imply the table structure of relational databases. There are a various amounts of NoSQL technologies, where MongoDB is a commonly used version. MongoDB has another approach compared to the well established relational databases. MongoDB is more of a collection rather than tables [48]. The collections could differ in structure and do not need to be set at the same formula, which enables a dynamic approach of storing data [47]. The actual data is more of a document rather than a table. This approach could need more dynamic space allocated since documents can be created with more freedom accord-ing to relational databases, where rules accordaccord-ing to foreign-keys and relations needs to be followed.

Documents in MongoDB include the essential data for that specific context, and every-thing is stored in so called JSON (JavaScript Object Notation) objects. JSON stores data in a key-value structure where a string is firstly stored as a variable name and a following string as the value, e.g. t”variablename” : ”value”u. The JSON objects could be stored inside other objects, which is the foundation for the document. Arrays with JSON objects could also be stored. This could be useful e.g. for storing many ice hockey players in a ice hockey team (see Figure 2.20). Worth mentioning is that objects such as ice hockey players could contain more information, while a team can store selected values, such as name, position and shirt number, filling the team object with only useful information for just that specific context.

(31)

2.3.5 Frontend Development

Web development is constantly evolving and new approaches of solving various problems in the chain of development are constantly being tried. The number of front-end JavaScript frameworks has exploded the latest years, since JavaScript is the most popular language for the browser [57]. The need to evaluate the range of all the frameworks is essential to pro-vide motivated choices according to the research available, and the same goes for back-end development.

Frontend means what the user sees and interacts with, and is mostly set of the so called Model-View-Controller (MVC) paradigm in mind. With separating the core data, stored in models, from what you see and interact with, the views, the development is facilitated. The controller is the middle hand that transfers the demands from the view, updating or creating models that gives a response to the view and what the user see. This approach is an ordi-nary paradigm in frontend development and at this point it is the standard way of creating frontend tools. [58].

All of the actual frontend frameworks being introduced in the thesis are not only built by MVC, but they are also enabling so called Reactive Programming. This enables the response of the user to be direct input to the application in real-time. The different frameworks treat the reactive demands in different ways, but all of the three frameworks has the feature of making direct responses to interactions [59].

2.3.5.1 Angular

As already mentioned, many developers have been using Angular.js as the frontend frame-work in the latest years [49]. Angular is developed by Google and is a JavaScript frameframe-work for frontend development, built on the classical Model-View-Controller approach, where the presentation of the data and the interactions are separated. It is built to enable a more dy-namic and intuitive approach for creating web applications (with Single-page applications mostly in mind). Worth mentioning is that Angular 2 is using so called TypeScript instead of JavaScript, which transcompiles to the latter. It is a language developed by Microsoft that compiles the TypeScript code into JavaScript, making it available for multiplattform [60].

Angular enables two-way data binding, which means that you can update and display data in an easier way. Some other features enabled in Angular is that you can search for specific DOM (Document Object Model) objects directly in the code. DOM means the differ-ent elemdiffer-ent of the HTML code, such as body-, p- or ul-elemdiffer-ents. These are structured in a tree-like structure, enabling Angular to read or update these in different ways, e.g. looping through all list entries in a list directly in the code. Angular also has a way of enabling mock data into an application, which is sort of a test object that is inputted to test the behaviour of the system. [61].

2.3.5.2 React

React is a JavaScript library used for frontend development. It was released by FaceBook and Instagram in 2013 to challenge paradigms in ordinary HTML, JavaScript and CSS formula. To solve that the dynamic data sets, with the constant flow of new updates and images, and dis-playing these changes caused some problems. React has answered that problem formulation with being a library that introduces States. When data is changed or updated the state is re-placed with a new one and making a seamless experience for showing updated data in front of the user. This enables data to be changed directly, not causing the problems of updates that an ordinary frontend-stack would have. This is a more effective approach of enabling two-way data binding and React is known to be both scalable and fast [61].

It has a Model-View-Controller approach and has the ability to treat big scalability, since both of the mentioned applications has a large amount of registered users and many daily users. The so called React Native is an updated version with the approach of making code

(32)

that works everywhere, enabling the possibility of creating mobile applications for the web, Android and iOS at the same time. [62].

2.3.5.3 Vue

Vue is a modern JavaScript framework and the newest of the three treated in the literature study. Vue was originally developed by the author Evan You in 2014, and since no big com-pany owns Vue it is totally open-source. The community has increased in the latest years with a compulsory documentation, and it has 53,700 stars on Github [63].

Vue does not imply a two-way binding approach, but has developed a robust single-way binding with good performance. Some criticism according the two-way binding is that it could get hard to follow the flow of data changes, and Vue has solved this with another ap-proach to updating data. This simplifies debugging since it easier to follow the changes in data [61]. Even though no two-binding approach is present the performance of Vue is com-petitive. Stefan Krause lists, in a table at his blog, the performance of JavaScript Frameworks compared to one another when assigning, deleting and updating data to DOM elements, where Vue is performing well amongst all the frameworks [64].

Vue follows a structure where you collect the data (in form of variables, arrays etc) in a Vue-instance, which could be separated into components. The following example shows how data declaration is directly made in the declaration and how easy it is to access the data with two curly brackets according the Vue component (see Figure 2.21). This is made with far less code than for an example React [65].

Figure 2.21: Example of Vue code with the output ’This is a testing text’.

Not only data can be stored with ease, but also methods and other settings can be set according to the Vue structure. So Vue is a great way of getting started, since you only need to import a Vue-script directly into the HTML code if you just want to enable Vue. In a larger scale application this is not a valid option, since you want to store as much as possible for performance, but to get started and learn is really simple with this approach. You import it like you would import any other js-file or a .css-file, but from a source online to just get the syntax needed to create.

2.3.5.4 JavaScript Compilers

JavaScript Compilers is something that has risen in the frontend development, enabling bundling and performance improvements for frontend frameworks. Webpack is a commonly used software for stacking modules into a single. An example could be that you have many .js-files in developing a functional and responsive frontend experience, but they are spread out and are dependent on each other in ways that are hard to structure. What Webpack does is that it transforms all of the dependencies to static assets that the application can reach with a much more easily found and with greater performance. So after Webpack has built the assets from the different modules, the applications loading times should been reduced and structure the assets in an easier way. This could be really helpful since the building just needs to be done once and when the application is running Webpack has already made their