Creating and Evaluating a Useful Web Application for Introduction to Programming

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science and Engineering

2020 | LIU-IDA/LITH-EX-A--20/076--SE

Creating and Evaluating a Useful

Web Application for Introduction

to Programming

Utveckling och utvärdering av en användbar webbapplikation

för introduktion till programmering

Daniel Johnsson

Supervisor : Anders Fröberg Examiner : Erik Berglund

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över-föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till-gängligheten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än-dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

The aim of this thesis is to build a web application to teach students programming in Python through code puzzles that do not require them to write any code, to answer the research question How should a quiz application for introduction to Python programming be de-veloped to be useful? The web application’s utility and usability are evaluated through the learnability metric relative user efficiency. Data was collected and analyzed using Google Analytics and BigQuery. The study found that users were successfully aided with theoreti-cal sections pertaining to the puzzles and even if programming is mainly a desktop activity there is still an interest for mobile access. Although evaluation of relative user efficiency did not serve as a sufficient learnability measure for this type of application, conclusions from the data analysis still gave insights into the utility of the web application.

(4)

Acknowledgments

I would like to thank my examiner Erik Berglund, friends and family that have been with me through this thesis and supported me all the way to the end.

(5)

1 Introduction 1 1.1 Motivation . . . 1 1.2 Aim . . . 2 1.3 Research Question . . . 2 1.4 Delimitations . . . 3 2 Theory 4 2.1 Introduction to Programming . . . 4 2.2 Learnability . . . 5 2.3 Related Work . . . 6 3 Method 8 3.1 Literature Study . . . 8 3.2 Implementation . . . 8 3.3 Data Analysis . . . 12 4 Results 15 4.1 Implementation . . . 15 4.2 Data Analysis . . . 23 5 Discussion 30 5.1 Results . . . 30 5.2 Method . . . 32

5.3 The Work in a Wider Context . . . 33

6 Conclusion 34 6.1 Future Work . . . 34

(6)

List of Figures

2.1 Learnability curves, adapted from Usability engineering . . . 6

4.1 The complete user interface on desktop . . . 16

4.2 The complete user interface on mobile . . . 16

4.3 Snippet with executable code in a theory section or description section . . . 17

4.4 Feedback on correct solution . . . 19

4.5 Feedback on incorrect solution . . . 19

4.6 Python error message . . . 19

4.7 Example of an Indentation puzzle . . . 20

4.8 Example of a Line Order puzzle . . . 21

4.9 Example of a Comment Out puzzle . . . 21

4.10 Example of a Missing Word puzzle . . . 22

4.11 New users over the period . . . 23

4.12 Sessions per user . . . 24

4.13 Unique user success of solving puzzles . . . 26

4.14 Unique users that open the theory or the description sections . . . 27

(7)

List of Tables

3.1 Standard puzzle control buttons and corresponding hotkeys . . . 11

3.2 Puzzle type specific controls . . . 11

3.3 Event types . . . 12

3.4 Event parameters . . . 12

4.1 Mobile sessions time per user . . . 24

4.2 Number of interaction per type . . . 25

4.3 Solution status before end of session . . . 26

4.4 Learnability data . . . 28

(8)

1 Introduction

This section aims to introduce the aim of and motivation behind this study.

1.1 Motivation

At university level studies, programming is taking a larger part and is becoming a subject of great importance. Even outside the computer science field, several programs have a pro-gramming course in the curriculum. Introduction to propro-gramming in these courses can be challenging for a student that has not worked with software development before. Writing a program for the first time can be overwhelming, and just setting up the environment needed for running the program can be daunting and is error-prone. This can take away from the learning experience as the focus is shifted away from learning the actual programming lan-guage and the programming concepts, which has created the need for programming tools and platforms for learning to code. Can a better learning experience be achieved by learn-ing programmlearn-ing concepts through a web application with interactive quizzes that do not require any written code?

The internet is filled with applications, and the number of applications hosted on the web instead of installed on your computer grows every year. Having an application on the web opens up several possibilities for the developer, such as instant updates and user analytics. Using an application on the internet is uncomplicated from a user perspective since it requires no download or installation. This can, however, be a disadvantage, since the user has not invested any time. The user has little incentive to understand how the application works since they can easily give up and find a substitute application. Thus convincing the user to download and install an application has now shifted towards convincing the user to stay on the application.

Most web applications use some form of web analytics tool to track everything from bare page landings to user interactions. The data is often used to give stakeholders insights and measure drop off rates. Can the same data be used to identify if a web application is useful?

(9)

1.2. Aim

1.2 Aim

The aim of this thesis is to build a web application for an introduction course to Python programming that is useful. Instead of letting the student write their programs from the start, the user is introduced to programming concepts such as statements and loops and can test these concepts in puzzle-based examples. With a theory section containing code examples and puzzle description in plain text combined with the corresponding problem to be solved in code, the user learns to code by solving practical puzzles. With the use of automatically collected user interaction data, the applications utility and usability can be evaluated. The thesis hopes to contribute to the research field of computer science education in how an e-learning experience can help shape the future of computer science education and in the field of usability evaluation with the use of Google Analytics.

1.3 Research Question

The main research question for this thesis is:

How should a quiz application for introduction to Python programming be developed to be useful?

For a web application to be useful it has to score high in usability and utility metrics. In this context, usability is defined as how easy and satisfying the web application is to use, and utility is defined as at what level the web application has the features the user needs. [1] Usability is a broad field of different characteristics; Nielsen’s survey showed that learning-related usability characteristics are at the top of the most sought after quality metrics from users [2]. Therefore, this thesis will aim to evaluate the web application’s learnability as a substitute for overall usability.

For a single developer, thorough user testing and usability evaluation can be costly since it requires much time and often requires a usability expert. The goal of this thesis is to create something that can be improved over time. Therefore, the method of evaluation of usability and utility needs to be efficient and replicable. Only using quantitative performance metrics have been rejected by experts, as Constantine and Lockwood mention:

“No simple number can completely represent anything as subtle and complex as the usability of a software system, but numbers can sometimes create the illu-sion of understanding.” [3]

However, using the extensive amount of data collected from all users analyzed with formulas and visualization could give insight that is needed to compare the learnability through new versions and evaluate the usage of implemented features.

To answer the research question, a quiz based web application for introduction to Python programming will be developed based on studies of what students think is difficult when it comes to learning programming, and new ideas to mitigate the issues students have when learning to program. Data collection will be implemented to track how users interact with the application that which be used to assess the utility and usability of the web application to evaluate if the web application is useful. To prove that the evaluation of utility is sufficient, this thesis investigates the usages of the reading sections, mobile implementation, alternative

(10)

1.4. Delimitations

control schemas. Relative user efficiency as a performance metric for learnability will be used to evaluate the web application’s usability.

1.4 Delimitations

This thesis is conducted during a limited time, and the scope of the study is set accordingly since the whole application can not be evaluated. The focus of this will be on the evalua-tion of the quiz secevalua-tion of the applicaevalua-tion. Any addievalua-tional pages and services will be devel-oped, such as landing page, quiz menu, sign up functionality, and a quiz creator. These parts will be developed to support the application’s quiz section, and no events will be collected from interactions on these pages. When evaluating the usability, only puzzle solving will be considered. To evaluate the utility, only the usage of the features reading sections, mobile implementation, and alternative control schemas will be evaluated.

The test group is not forced to use the web application since it is only added as a voluntary supplement to a course in introduction to Python, which may affect the web application’s overall usage.

(11)

2 Theory

This section aims to introduce the reader to the theoretical basis of the paper.

2.1 Introduction to Programming

There are many studies conducted on introduction to programming and software develop-ment. Bosse and Gerosa in their study “Why is Programming so Difficult to Learn? Patterns of Difficulties Related to Programming Learning Mid-Stage” emphasize that their study’s most frequently reported issue was syntax errors. The students had to resolve these issues before they could validate that their logic is correct. They also notice that students tend to drop out of the exercise faster if they encounter semantic errors compared to syntax errors since they know that it would take longer to resolve these errors. [4]

From the study “A study of the difficulties of novice programmers.” students rated working alone on programming coursework and studying alone the highest when asked when they feel that they learn about issues in programming. Both students and teachers rated example programs the highest when it comes to the most useful material. [5] Andersson and Kroisandt made similar findings where they found that students find that lecture notes, online exercises, and their solutions were the most valuable components of an e-learning platform. [6] Winslow states that the difficulty of learning a new language is learning how to use state-ments and combine them to achieve the desired result and not the actual syntax of the lan-guage. To be able to reach a level of competence in programming, practice is needed. It has to start with simple problems that require a simple combination of statements and work towards more complicated programming problems needing more complicated solutions. [7] Gick and Holyoak suggest that knowledge that is difficult to verbalize can instead be pre-sented with the help of examples. More than one instance of the same example is needed to be able to see the anomaly and convey the knowledge. [8]

(12)

2.2. Learnability

2.2 Learnability

Learnability or ease of learning as it also can be referred to as can simply be defined as how easy it is for a user to learn a system [9], or how easy it is to explore and master the soft-ware [3]. A more advanced definition of learnability is presented in ISO/IEC 25010:2011 where learnability is categorized as a sub characteristic to usability and defined as:

“Degree to which a product or system can be used by specified users to achieve specified goals of learning to use the product or system with effectiveness, effi-ciency, freedom from risk and satisfaction in a specified context of use” [10].

Learnability can be further decomposed into sub characteristics as suggested by Fernandez, Insfran, and Abrahão. They divide it into sub characteristics: help facilitates, predictability, informative feedback, memorability [11].

Measuring Learnability

To measure learnability some form of usability evaluation has to be conducted, which iden-tifies usability problems. A usability problem can be described as the possibility of a design change improving one or more usability measures [9]. Fernandez, Insfran, and Abrahão em-pathize the importance of evaluating usability during each stage of web development. This will ensure that the product can be used; it also ensures that it will be effective for the prod-uct’s intended purpose [11]. To evaluate the usability in an application, different methods can be used. Mack and Nielsen defines four evaluation methods: automatically, a software eval-uates the interface; empirically testing the interface with real users; formally using formulas and models, and informally the evaluators’ judgment [9]. Automated evaluation misses the emotional information from the user and should not be used as a substitute for traditional usability evaluation [12] [13].

Measuring learnability can be as simple as measuring the time it takes for a new user to reach a specified level of proficiency, as Nielsen refers to as relative user efficiency [14]. To calculate the relative user efficiency, some other performance metrics need to be calculated in advance. Completeness is the percent of completed work within a set time, and correctness is the percent of work completed correctly [3] and can be calculated by equations 2.1 and 2.2, respectively. With completeness and correctness, effectiveness can be calculated using equation 2.3, which describes how effectively a task has been completed [15]. Efficiency can be calculated by equation 2.4 and is the measure of how many resources have been used related to the level of effectiveness. In this case, time, human effort and economical cost can be viewed as a resource. [15] Proficiency can be described as an expert level of efficiency [3]. For a user to reach a level of proficiency, they have to complete a specified task successfully or within a specified time frame. A user can be considered to have learned the system when it has reached reach the specified proficiency level [14]. Worth noting is that time on task can be affected by the reading time and other overhead while doing automated data collection [13].

Completeness= Completed Tasks

Time (2.1)

Correctness= Correct Tasks

Completed Tasks (2.2)

(13)

2.3. Related Work

E f f iciency= E f f ectiveness

Task Time (2.4)

Relative User E f f iciency= UserE f f iciency ¨ 100%

Expert E f f iciency (2.5)

High Learnability Drawbacks

If the main focus is to improve learnability, some unwanted side effects can be introduced. A software that is easy to learn might not be as efficient to use; there is a trade-off between making the program efficient or easy to learn [3]. Figure 2.1 shows an illustration of this hypothetical curve of efficiency trade-off when focusing on learnability. An important aspect is that improvement in learnability does not always lead to a decrease in efficiency. If possible, a design should always try to improve both learnability and efficiency. If not possible, the usability requirements of the software should decide which usability metric to promote [14].

Time Usage Pr oficiency and Ef ficiency Focus on expert user Focuson novice use

Figure 2.1: Learnability curves, adapted from Usability engineering. [14]

2.3 Related Work

Creating alternative learning methods for novice programmers is nothing new. Guibert, Gi-rard, and Guittet created a framework to visualize programming problems to provide stu-dents with a better understanding of how the code operates [16]. Parsons and Haden made a website with code puzzles where the user has to place the code lines in the correct order to complete the puzzle. These types of puzzles have become known as Parsons problems after their creator. In the study “Parson’s programming puzzles: a fun and effective learning tool for first programming courses” they found that users wanted more puzzle types and better aesthetics in the form of color and animations [17]. Several studies used Parsons problems af-ter this; Pritchard and Vasiga created a learning tool for learning Python using Parsons prob-lems; however, the tool could not execute Python [18]. Ihantola, Helminen, and Karavirta made a mobile application using Skulpt to execute Python code, with Parsons problems with extended functionality such as indentation and blanks in the lines of code [19].

Tilden created an online development environment for Python to bridge the gap of start-ing programmstart-ing for novice programmers utilizstart-ing Skulpt to execute Python code in browser [20]. The environment was then improved in the study “Pythy: improving the intro-ductory Python programming experience” [21].

(14)

2.3. Related Work

Most of these tools are publicly available online. There are similar publicly available free tools from companies or non-profit organizations such as Codecademy1or Khanacademy2, which both offer online Python courses. Sharp used Codecademy for his studies and noticed the positive aspects that the students could access the coursework at their convenience and pace. In this study, the academic results of the students were evaluated [22].

Although Google Analytics and web analytics are more of a standard today, event data an-alytics has been researched much earlier. In 2000, Hilbert and Redmiles performed experi-ments of how user interface events could improve usability evaluation in the study “Extract-ing usability information from user interface events” with the conclusion that automatic data collection will complement traditional usability evaluation in the future [23]. With the release of Google Analytics, several of the issue mentioned in the “Extracting usability information from user interface events” was solved. Further research on the subject has proceeded. Fang and Crawford use Google Analytics to track user navigation. Together with transactions log, they were able to draw conclusions regarding usability [24], and Hasan, Morris, and Pro-bets evaluated the usability of e-commerce sites using Google Analytics data [25]. Vecchione, Brown, Allen, and Baschnagel, where able to use event tracking in Google Analytics to pro-vide better decision-making when it comes to usability improvements [26].

1_{www.codecademy.com}_. 2_{www.khanacademy.org}_.

(15)

3 Method

This chapter will present the method of the thesis.

3.1 Literature Study

A literature study will be conducted as the foundation of the thesis to determine what makes it challenging to learn to program and measure usability using Google Analytics data. The literary study will be conducted to aid the application development and define how to mea-sure learnability on the web application. To conduct the literary study, a comprehensive research of relevant articles and publications using key phrases such as learnability, usability, computer science education, difficulty learning programming, and difficulty teaching pro-gramming. If no relevant articles or publications can be found, other sources will be used, such as official documentation of frameworks. The findings of the literature study will be the foundation of the theory chapter.

3.2 Implementation

A web-based quiz application will be developed to make it possible for students to take pro-gramming quizzes. The application’s front-end will be built in React, and Google Firestore will be used as the database solution. No backend will be developed since all the computa-tions can be done on the front-end. Since the primary test group is the students taking an introductory course the quiz puzzles will be based in Python. Only the Python language will be supported on the platform, but considerations should be made that make it feasible to add more languages in the future.

Firebase

The application will be hosted on Firebase, and Firebase will handle the authentication of users. This will make the application able to track user progress and make it possible to dis-tinguish events from individual users. Other support structures of the application will be developed as straightforward as possible, such as a landing page from which the user can find the available quizzes, registration modals, and a simple quiz menu to get an overview

(16)

3.2. Implementation

of a quiz. These parts of the application are only made to create a fully functional applica-tion. These support structures will consist of mostly static parts and will not be evaluated, as mentioned in section 1.4.

Skulpt

To be able to run Python code in the puzzles, the web application will use Skulpt. Skulpt is a JavaScript project that transpile Python code into JavaScript code that makes it possible to run the code in the browser [27]. Web workers will be used to handle the Skulpt execution to mitigate the potential issue with executing time-consuming code snippets on the UI thread. This approach will ensure that users can still interact with the puzzle while code executes on a web worker. The web worker will be executed from within a React hook that will take a string of Python code as input and return rows of output or errors after Skulpt had transpile and the code had been executed.

Google Analytics

By far, Google Analytics is one of the most commonly used web analytics tools [28]. Google Analytics measures user interactions on a website or application. Google Analytics provide an out of the box-solution for collecting data about how users come to the website or appli-cation and how they navigate. This data can be analyzed to understand user patterns and gives an overview of how the site or application is used. Event tracking can be added to trigger on user events such as onClick or onSubmit events [29]. Adding additional event tracking requires extra implementation since code needs to be added to each event or action tracked. However, this leads to a greater understanding of how users interact with the site or application [30]. Google Analytics supports up to 25 custom parameters to be sent with each event [31].

BigQuery

Google’s BigQuery is mainly a query engine for querying large amounts of data, also known as big data. BigQuery uses an SQL dialect as the query language. BigQuery is more than just a query engine since it also uses its own cloud-based storage for the data. Data can be imported to BigQuery, and Google Analytics can be integrated to automatically copy data daily from Google Analytics that then can be analyzed in BigQuery. [32]

Quiz

The central part of the application will be the page where the user will conduct the actual quiz. A quiz will consist of a set of chapters. The primary purpose of a chapter is to group several puzzles around a theoretical topic. A chapter will consist of a set of puzzles and a theory section. This section will cover the theoretical foundation needed to complete the puzzles of a chapter. The quiz page will show the current chapter and the current puzzle, and the user should be able to freely move to the next or previous chapter or puzzle with navigation buttons in the user interface. If a user uses the chapter navigation, the puzzle will also change to the new chapter’s first puzzle. If a user navigates with the puzzle navigation, only the puzzle will change, except if there are no more puzzles in the desired direction, the chapter will change.

A quiz for the test group’s course will be created in the application. However, it should be possible for the web application’s validity to make new quizzes and edit existing quizzes directly in the application. Hence, a quiz creator tool will be developed to make it possible to create and edit quizzes directly in the application. The quiz creation tool will aid in the creation of the example puzzles; however, it will not be evaluated as mentioned in section 1.4.

(17)

3.2. Implementation

Puzzle

Each puzzle will also have a corresponding puzzle description in which the puzzle can be explained in free text. The puzzle description will be added to guide the users with puzzle specific information that is not covered by the theory section. Each puzzle will consist of code that has been altered in different ways depending on the puzzle type, and it is up to the user to manipulate the code in a way that will produce the expected output. The user’s progress will be saved in the database upon submission of a correct solution, and if a puzzle has been completed before, this will be shown next to the puzzle title. However, the user will still be able to retake the puzzle. To add variation and flexibility to puzzle creation, four different puzzle types will be developed.

Indentation

The indentation puzzle type will present code with all indentation of lines removed at the start of the puzzle. To solve this puzzle, the user will have to move between the code lines and add indentation to make sure that the code has the correct indentation.

Line Order

The line order puzzle type will present code with all lines shuffled and therefore executed in incorrect order. To solve this puzzle, the user will have to move the code lines up or down to make sure that the code lines are executed in correct order.

Comment Out

The comment out puzzle type consists of code that has lines of faulty code commented out. At the start of the puzzle, the comment notations will be removed from each line of code. To solve this type of puzzle, the user has to move between code lines and add a line comment notation to comment out the rows of code that is faulty.

Missing Words

In addition to the Python code a missing word puzzle will have two additional fields of data, a set of words or characters that will be replaced with a blank gap in the code and a set of words or characters that are red herrings to add additional complexity to the puzzle. To solve this puzzle type, the user has to fill in the blanks in the code by choosing the correct word or character from the set of removed words or character and the red herrings words or character.

Execution of Code

A mentioned above, Skulp is used to emulate execution of Python code in the web applica-tion. Using Skulpt will return both outputs from the code and Python error codes if the code could not be successfully executed. The output will be used both for feedback to the user in the form of an output section in the user interface next to the puzzle code and used to validate the correctness of a submitted solution of a puzzle. At the start of a puzzle, the correct Python code for the puzzle will execute in the background, and the output will be saved. When a user submits a solution, the web application will execute the code and compare the output with the saved correct output. The system will categorize the puzzle as correct if the output matches the correct output; otherwise, it will categorize it as an incorrect solution.

Puzzle Interactions

All puzzle types will have the same control buttons when executing the code referred to run in the user interface, reset the puzzle, and undo the last interaction. Each of which will have a

(18)

3.2. Implementation

corresponding keyboard interaction to enable the use of a keyboard. These keyboard hotkeys are shown in table 3.1.

Table 3.1: Standard puzzle control buttons and corresponding hotkeys Button Hotkey

Restart None

Undo Backspace

Run Enter

The web application will provide visual feedback to the user on all interactions by altering the code displayed according to the user’s interaction. However, the web application will also provide conditional feedback upon executing the code. If the solution is correct, the user will be provided with an animation and an alert that inform the user that the solution is correct. The alert will also prompt the user to move to the next puzzle with a button in the alert to navigate the user to the next puzzle. Upon a correct solution, the run button will also change to a button that will navigate the user to the next puzzle. If the solution is incorrect, the user will be shown an alert that informs the user that the solution was incorrect and will have a button that reset the puzzle to prompt the user to try again.

To manipulate the code in a puzzle, will each puzzle type have its own interaction controls, which consist of buttons in the user interface. Some puzzle control interactions will have extra alternate controls besides the interaction button. The alternate controls will be either keyboard interactions or cursor interactions in the form of mouse click or touch interactions. These alternate controls will be added to puzzle types that will have code manipulations that could be substituted with either keyboard interactions or cursor interactions. The alternate interaction controls that will be added are shown in the table 3.2.

Table 3.2: Puzzle type specific controls

Puzzle type Button Hotkey Cursor

Indentation

Tab Tab None

Shift + Tab Shift + Tab None

Up Up arrow None

Down Down arrow None

Line Order

Move up Shift + Up arrow

Drag and drop line

Move down Shift + down arrow

Up Up arrow None

Comment Out

# Shift + 3 Click on line

Up Up arrow None

Missing Word Word None None

Data Collection

An assortment of Google Analytics events will be set up to trigger based on user interactions and on certain conditions to evaluate the web application. The code that will trigger tracking of events will be added on top of the developed application. The events that will be tracked is shown in the table 3.3. The data collection period will be three weeks long between 2020-09-11 and 2020-10-01.

(19)

3.3. Data Analysis

Table 3.3: Event types

Event Description

Interaction Triggers on any puzzle interaction mentioned in the tables 3.1 3.2

Open Triggers on expanding the theory or description section

Correct Triggers if the solution matches the expected output after an execution Fail Triggers if the solution does not match the expected output after an execution

Idle Triggers if the user is idle for more than five minutes

In addition to the event name, will each event have a payload of parameters. Google Analyt-ics provides some parameters in which session number, user id, and device type will be used. Other than these parameters, additional parameters will be added to each event, as shown in table 3.4.

Table 3.4: Event parameters

Parameter Description

Timestamp Timestamp of event

Start time The time at the start of a puzzle

Quiz id The id of the quiz

Chapter id The id of the chapter

Chapter index The index of the chapter in the quiz

Puzzle id The id of the puzzle

Puzzle index The index of the puzzle in the chapter

Puzzle type The type of puzzle

Type on Interaction event The type of interaction; button, keyboard or cursor Type on Open event Determine if the user open theory or description

3.3 Data Analysis

The collected event data will each day be exported to Google BigQuery 3.2 where it can be queried with SQL. The exported data is in raw format, and all added parameters will be grouped in a key-value array. Thus before the data can be queried, the event parameters will be unnested into columns. Conditions will be added to each query to only include events from the data collection period 2020-09-11 to 2020-10-01. Events from users that were in-volved during the development and from other quizzes than the example quiz will be filtered out.

Users

The number of users will be queried from the data by selecting all unique user ids and the date of the first interaction event connected to said user. The user id is connected to the logged in user. This will show the requirements of new users over time, which will be of interest in discussion regarding puzzle-solving and sessions per user. Each user will be anonymized and given a number that will be the same number throughout the study to enable cross-reference results.

User Sessions

In this study, a session will be defined as the one consecutive period in which the user is actively interacting with the quiz page. The number of sessions and the session length in minutes will be calculated for each user with a query that groups all events per user and session number. Google Analytics provides its own session parameter; this will only change if the web application is closed, leading to an unrealistic session times in the evaluation of session times. An internal session calculation will be used by using the idle event as a trigger to break a session. This means that the session will count from the start until an idle event has occurred. If an idle event has occurred, the next interaction will count as a new session.

(20)

3.3. Data Analysis

Time will be calculated from the first event on a puzzle to the last event, not an idle event. Other than the session times, the session query will also return the device type used during that particular session. Google Analytics provides this information, and the device type will be either desktop or mobile.

Interaction Type

To analyze the usage of the alternative controls implemented as mentioned in tables 3.1 and 3.2 a query will be made to summarize the total amount of the interactions using ei-ther button, keyboard, or cursor interaction. This will be calculated by grouping the type of interaction events and summarizing the number of events per group.

Completed Puzzles

To get an overview of user progress, the number of unique users that solve a puzzle correctly, incorrectly, or interacts with the puzzle and do not submit a solution will be calculated. This will be calculated by grouping the events by user id, chapter index, puzzle index, and sum-marize the number of correct and fail events within the group of events. If the number of correct events is more than zero, the puzzle will be seen as completed by the user. Suppose the number of correct events is zero and the number och fail events are more than zero. In that case, the puzzle will be seen as incorrect by the user, and if the number of correct and fail events are both zero it will be seen as the user has interacted with the puzzle but not submit-ted any solution. This will show the unique user’s progress on each puzzle within the quiz. These groups will then be grouped by only chapter index and puzzle index to summarize the number of unique users who completed the puzzle, did try but could not solve it correctly, or just interacted with the quiz and did not submit a solution.

Theory and Description Usage

The number of unique users that use the theory or description section per puzzle will be cal-culated to evaluate the usage of the chapter theory sections and puzzle description sections. This will be calculated by grouping the events by user id, chapter index, puzzle index, and summarize the number of correct and open events by the type theory or description within the group of events. If the number of open events of type theory and description is more than zero, it will be seen as the user opened both reading sections on that puzzle. If the number of open events of type theory is zero and the number of type description is are more than zero, it will be seen as the user opened only the description section on that puzzle. If the number of open events of type description is zero and the number of type theory is are more than zero, it will be seen as the user opened only the theory section on that puzzle. If the number of open events is zero, the user did not open any of the reading sections on that puzzle during all the sessions. These numbers will then be grouped by chapter index and puzzle index and summarize the number of users that opened both sections, only the theory section, only the description section, or none of the sections on each puzzle in all the sessions.

To evaluate at which state of the puzzle a user opens the reading sections a query will be used to calculate if the theory or description sections are opened after an incorrect solution has been submitted. The query will group all events by user id, chapter index, puzzle index, and start time. Each of these groups will represent an attempt on a puzzle. The start time is included in the grouping since a user can conduct the same puzzle several times. All groups of events that do not include an event of the type open will be filtered out. The remaining groups will summarize the number of events of the type open triggered after an event of type fail and the number of events of the type open that does not precede an event of type

(21)

3.3. Data Analysis

fail, both for the type theory and description. These groups will then be grouped by chapter index and puzzle index to summarize all users’ open events.

Usability

To evaluate the platform’s usability the metrics mentioned in section 2.2 will be calculated. To calculate these metrics a task needs to be defined. For this evaluation, a task will be defined as the user submits a solution to a puzzle. The task will be defined as correct if the solution output matches that puzzle’s expected output and incorrect if the output does not match the expected output. If a user has an incorrect solution and continues to solve the same puzzle, this will be regarded as a new task.

The metrics that need to be extracted from the data to calculate the usability metrics are total time, number of correct tasks, and number of incorrect tasks. These metrics will be extracted by a query that will group every event within a task. To group all events within a task the query will group the event by user id, chapter index, puzzle index, and task start time. Task start time will be calculated as either the start time of a puzzle or the timestamp of a precursor event of type fail during the same puzzle attempt. From these groups of task events, the task time will be calculated by the timestamp of a correct event or fail event subtracted by the timestamp of the first interaction event of that task. The time spent on reading will also be subtracted from the task time to reduce noise in the data. Time spent on reading will be estimated by the time difference between an open event and the next interaction event. A task that does not have a correct or fail event will be redacted, if an idle event triggers during a task, this will be removed, and a new task will start by the next interaction after the idle event. A task will be flagged as either a correct task or an incorrect task by evaluation if a correct or corresponding fail event triggered during the task. The total time, the number of correct tasks, and the number of incorrect tasks will be summed up for each user.

Completeness will be calculated with the equation 2.1. Where completed tasks will be the number of correct tasks and incorrect tasks and time will be the total time of all completed tasks. This will provide us with the number of tasks the user could complete per minute. The correctness will be calculated using the equation 2.2, this will provide a percentage of correct completed tasks of all completed tasks. Effectiveness will be calculated using the equation 2.3. Efficiency will be calculated using the equation 2.4. Where task time will be the average task time for that user. Both effectiveness and efficiency will be numbers that can be compared between the users where higher is better. These numbers have to be compared within the same context to mean anything.

For the sake of this experiment, the author of the system and puzzles will be considered an expert. To calculate the proficiency, the author will solve all puzzles, and the metrics will be calculated in the same as mentioned earlier. Proficiency is equal to the authors’ efficiency. From these calculated metrics, the relative user efficiency can be calculated using the equa-tion 2.5 by subtracting the user’s efficiency with the proficiency; in this case, the authors’ efficiency. This will provide at what percentage of the wanted efficiency every user is at.

(22)

4 Results

This chapter will present the result of the implementation and data analysis.

4.1 Implementation

A significant part of this thesis consisted of creating a web application to introduce program-ming in Python. This thesis concludes the third iteration of Coder Quiz and a complete rebuild of the web application. Hence support structures must be developed, as mentioned in the delimitation 1.4; these support structures will not be covered in this thesis.

Responsive Design

Even though programming is usually done on a computer using a keyboard, the fundamental idea of Coder Quiz is that the user should not write any code themselves. This means that a keyboard is not needed to solve the puzzles. Coder Quiz was instead designed with a user interface of buttons to accommodate all interactions. A side effect of this design is that Coder Quiz can work on any device since it does not require any external inputs. One could say that Coder Quiz is designed as mobile-first; this is not entirely true since some alterations of the UI had to be done to be able to fit on a smaller screen. Breakpoints were added to move the output below the code, as seen in figure 4.1 compared to figure 4.2. In addition to these changes, alternative inputs were added, such as hotkeys for keyboard and click or touch control when it was possible, as described in the method chapter 3.2 in the tables3.1 3.2. The rest of the images shown of the web application user interface will be shown with the mobile layout due to better readability.

(23)

4.1. Implementation

Figure 4.1: The complete user interface on desktop

(24)

4.1. Implementation

Executable Code

One of the significant changes in this iteration of Coder Quiz is executing Python code di-rectly in the web application. This was achieved with Skulpt, the Skulpt package was modi-fied to be used as a React module in this thesis.

Figure 4.3: Snippet with executable code in a theory section or description section

Quiz

The quizzes are created on the web application by users with administrator rights. After a quiz is created, it is stored in the database. The quiz object consists of a quiz name, a boolean to control if the quiz is published, and an array of chapters. If the published boolean is set to true, the quiz is visible for all users, and if it is set to false the quiz is only visible for a user with administrator rights.

Users choose which quiz they want to take and select where to start. A decision was made not to lock any chapters or puzzles based on progress. This means that the user can start with whatever chapter and puzzle they want and does not have to complete a puzzle before starting a new puzzle. Though, there is still an order in which the chapters are presented to the user progressing through a quiz. This order is based on the chapters’ index in the chapter array that the author of a quiz decides to give them the possibility to move the puzzles and chapter without the user losing their progress. The progress of a user is saved to the database using the id of the chapter and puzzle.

An example quiz was created for this thesis with a total of 41 puzzles across nine chapters. This quiz was created to introduce the basics of Python.

Chapter

A chapter consists of three fields an id, name, theory, and an array of puzzles. The id makes sure that a chapter’s progress can be tracked since a quiz can be edited after a user starts a quiz. The name of a chapter is displayed in a toolbar at the top of the page between two arrows that can be used to navigate to the next or previous chapter as seen in figure 4.2.

Theory Sections

To provide flexibility to the quiz author for the chapter theory design, Markdown language was used. This gives the author full control of how the chapter theory is formatted. A React package was used to convert the Markdown text to corresponding HTML code, and support for inline code blocks and code snippets was added. If the code snippet is executable, a play button will show inside the snippet, as seen in figure 4.3. The chapter theory section is collapsed by default, and the user can expand the chapter theory section by clicking on the bar or the arrow.

(25)

4.1. Implementation

Puzzle

A puzzle consists of an id, name, description, a puzzle type, and code. It can also have other puzzle type-specific data. Same as a chapter the id is used to be able to show if a user has completed the puzzle before. As seen in figure 4.2, a puzzle’s name is placed between two arrows at the top of a puzzle. These arrows make it possible to navigate to the next or previ-ous puzzle. If there is not a next or previprevi-ous puzzle in the chapter, the web application will navigate to the next or previous chapter instead based on input. All puzzles show informa-tion about what puzzle type it is together with a short text about that puzzle type’s primary objective.

After a user executes the code for a puzzle, the expected output is not stored in the database; instead is the puzzle code executed at the start of a puzzle, and the output is stored. After the user has altered the code and pressed the play button to execute the code is the output is compared to the stored expected output. If the two outputs match, the solution is correct, and if they do not match, the solution is incorrect. With this solution, the code does not to be compared and enables puzzles to have several solutions. However, it can also lead to an inefficient solution is seen as a correct solution depending on how an author has designed the puzzle.

Puzzle Description

Each puzzle has a puzzle description which purpose is to provide more information about the puzzle that is not covered in the theory section. This could be in the form of information about what algorithm is used in the code or the expected output. Same as the chapter theory section mentioned in section 4.1 is the puzzle description written in Markdown language with the added support for code snippet as seen in figure 4.3. The puzzle description section is also collapsed by default, and the user can expand the puzzle description section by clicking on the bar or the arrow.

Puzzle Controls

All puzzles have the same base controls that can be seen at the bottom of figure 4.2. These controls have hotkeys, as listed in table 3.1. The base controls consist of executing the code in the puzzle with the play button, undoing the last interaction with the undo button and resetting the puzzle using the reset button. When a user executes a puzzle solution, the web application will extract the code from the puzzle with the user’s alteration and pass it to be transpiled and executed by Skulpt as mentioned in section 4.1. When a user resets the puzzle, the state will be set to the initial state. To undo an interaction the previous states need to be tracked. A React hook was created to save the previous state when an interaction is made. Saving all previous states gives the user the possibility to undo its action until the user reaches the desired state. The save states are removed if the puzzle is reset to its initial state.

Feedback

The visual feedback of the application guides the user through the puzzles. If a submitted solution is correct, a green alert is shown together with a confetti animation, as shown in figure 4.4. The alert also shows a next puzzle button, and the play button in base puzzle control is also changed to a next puzzle button in a call to action to the user to continue to the next puzzle. If the solution is incorrect, a red alert is shown to inform the user that the solution does not provide the expected output, together with a reset button in a call to action to reset the puzzle and try again, as seen in figure 4.5.

(26)

4.1. Implementation

Figure 4.4: Feedback on correct solution

Figure 4.5: Feedback on incorrect solution

Code Output

With the implementation of Skulpt, as mentioned in section 4.1, the output will be produced when code is executed in the form of output from print statements or errors. As shown in figure 4.6, Python error messages are displayed if the errors occur during execution. This can aid the user in understanding what is incorrect with the current solution of a puzzle. Even if there are no errors, the output could still be different from a puzzle’s expected output. Showing the output can still guide the user towards understanding what the code does and how to achieve the expected output.

Figure 4.6: Python error message

Indentation

The Indentation puzzle type only uses the base fields of a puzzle mentioned in section 4.1. This puzzle type aims to add indentation to lines of code until it provides the expected output. The current row is highlighted, and the user can use the control button shown in figure 4.7 to change the current row and add or remove indentation. The user can also use the hotkeys arrow up and arrow down to move the selected row and tab or shift and tab keys to add or remove indentation as mentioned in the table 3.2.

(27)

4.1. Implementation

Figure 4.7: Example of an Indentation puzzle

The ability to match output from a user’s solution with the expected output made the imple-mentation of this puzzle simpler. In Python, the number of indentations does not affect the compilation as long as the indentation is on the same level within a code block. This means that there is an infinite amount of solutions to a puzzle. Since the web application match the code’s output with the expected result a user can add more indentation to a code block than needed and the web application will treat the solution as a correct solution since it is correct Python code. The ability to show error messages to the user also contributes to this puzzle since indentation errors will be shown to the user if the code is poorly formatted.

Line Order

The Line Order puzzle type only uses the base fields of a puzzle mentioned in section 4.1. This puzzle type aims to move the lines to the correct order to provide the expected output. The current row is highlighted, and the user can use the control button shown in figure 4.8 to change the current row and move the selected row up or down. The user can also use the hotkeys arrow up and arrow down to move the selected row and shift up or shift down to move a row. A row can also be moved by drag and drop, as mentioned in table 3.2.

(28)

4.1. Implementation

Figure 4.8: Example of a Line Order puzzle

Comment Out

The Comment Out puzzle type uses the base fields of a puzzle mentioned in section 4.1. The code is processed by removing the symbol # which is Pythons comment notation, and gets split up into rows. The puzzle type’s objective is to comment out faulty lines of code to produce the expected result. The user can use the provided controls shown in figure 4.9 to select a row and toggle if the row should be commented out or not. The user can also use the hotkeys arrow up and arrow down to move the selected row and shift 3 to simulate making a # symbol to toggle if a row is commented out or not. A user can also click on rows to toggle if a row is commented out or not, as mentioned in table 3.2.

Figure 4.9: Example of a Comment Out puzzle

Missing Word

Other than the base fields mentioned in section 4.1, a Missing Word puzzle also has two extra fields as attributes that consist of an array with removed words and an array with red

(29)

4.1. Implementation

herrings. A removed word consists of the word’s text, where the word in the code starts, and where it ends. This is needed for the web application to be able to remove the word from the code since a word can exist in several places in the code. The code is split up into segments of code. These segments are either a segment of code or a removed word. Removed words are instead shown with a highlighted blank space as displayed in figure 4.10. The darker color indicates the active empty space that will be replaced with the user’s provided input and the lighter colored indicates words that been added or empty spaces that not yet been added. The buttons show the available words or characters to use. These consist of both the removed words and added red herrings. It is up to the user to select the correct words or characters to complete the code. Compared to the other quiz types, there are no alternative controls other than the user interface buttons.

(30)

4.2. Data Analysis

4.2 Data Analysis

This section will cover insights from the data collection and analysis.

Users

During the data collection period, 20 unique users signed up on the platform, distribution over time is shown in figure 4.11. In the data analysis, each user is assigned a number from 1 to 20. The users will be referred to as UX, where X stands for this user’s unique identifier, e.g. U1. These numbers will remain the same throughout the whole data analysis section.

2020-09-11 2020-10-01 1 2 3 4 5 6 7 New Users

Figure 4.11: New users over the period

Sessions

As explained in the method, a session is ended when an idle event is registered, and the next interaction will mark the start of a new session. One side effect of this calculation is that if just one interaction has occurred after an idle event, this will result in zero time. These sessions have been removed. Henceforth, this calculation will be used to calculate sessions. In figure 4.12, each user’s total time spent on the application is plotted. Each block represents a separate session, and hence how many times a user has returned to the application.

(31)

4.2. Data Analysis

U1 U2 U3 U4 U5 U6 U7 U8 U9 _U10 _U12 _U13 _U14 _U15 _U16 _U17 _U19 _U20 0 50 100 150 200 250 T ime (Min)

Figure 4.12: Sessions per user

Device

Forty-five sessions were conducted using a desktop device, and three sessions were with a mobile device. These three sessions were from unique users. Total session time and the number of correct puzzles during these sessions is presented in table 4.1.

Table 4.1: Mobile sessions time per user

User Number of correct puzzles Session time in minutes

U1 16 24.2

U4 31 49.2

U13 17 6.4

Three mobile sessions can be seen as a small amount in this context. However, when consid-ering that programming is mainly performed on desktop computers rather than mobiles, it is notable that there is an interest in learning to code using this web application on a mobile device. Another remarkable aspect of these results is that user U4 has the most extended first session of all the users. This could be an indication that the web application is easy to use on mobile devices.

Interaction Types

All puzzles could be solved entirely by only using the buttons in the UI. As mentioned in the implementation section, hotkeys were added to an extent, and mouse or touch interaction

(32)

4.2. Data Analysis

was also added. In some puzzles, mouse or hotkeys could not be added because of limitations in the interaction. The user was provided information about these interaction types from the theory in the first chapter and through tooltips.

Table 4.2: Number of interaction per type

Button Keyboard Mouse

6986 1981 777

The usage of the different types of interaction methods is displayed in table 4.2. Button inter-action is the most commonly used one, which is expected since it is the primary interinter-action method implemented for all puzzle types and is available on all devices. It is worth noting that this does not serve as a comparison of the interaction types, since their availability varies across puzzles and devices. Keyboard interaction is not available on the missing word puz-zle, and mouse interaction is only available on line order and comment out. Only button interaction is available on mobile devices.

Completed Puzzles

Number of completed puzzles is a good indication of a users progress. In figure 4.13, the number of unique user’s progress of each puzzle in the example quiz is shown. Correct indicates that if the user solved the puzzle correctly, incorrectly indicates if the user could not solve the puzzle correctly, and none indicates if the user has interacted with the puzzle and not executed the solution. The 41 puzzles in the example quiz are sorted in order of appearance in the quiz and are named PY.Z with Y as the chapter’s index and Z as the index of the puzzle within the chapter. As mentioned in section 4.1 this order is not enforced. Out of the 20 users that signed up, 18 users completed puzzles, and 17 of these interacted with the first puzzle. This indicated that even if the order of puzzles is not enforced, it is still followed by most users. This can also be seen in figure 4.13, as number of users decrease in relation with the puzzle order.

One insight that can be gained from this data is the drop off rate of users. Number of dropoffs in relation to puzzle order is almost linear from the first puzzle until puzzle P5.2. No user was able to complete this puzzle, and half of the users who interacted with it did not even try to execute a solution.

(33)

4.2. Data Analysis P0.0 P0.1 P0.2 P0.3 P1.0 P1.1 P1.2 P2.0 P2.1 P2.2 P3.0 P3.1 P4.0 P4.1 P5.0 P5.1 P5.2 P6.0 P6.1 P6.2 P6.3 P6.4 P6.5 P6.6 P6.7 P7.0 P7.1 P7.2 P7.3 P7.4 P7.5 P7.6 P8.0 P8.1 P8.2 P8.3 P8.4 P8.5 P8.6 P8.7 P8.8 0 2 4 6 8 10 12 14 16 18 Number of Users Correct Incorrect None

Figure 4.13: Unique user success of solving puzzles

Based on the observations regarding puzzle P5.2, it appears as users stop using the web ap-plication if they have an incorrect solution. However, when looking at individual sessions, it is clear that most of the users end a session on a correct solution as shown in table 4.3, which shows number of users who drops off after an incorrect execution, correct execution or no execution.

Table 4.3: Solution status before end of session

Incorrect Correct No execution

17 29 2

Theory and Description Sections

As mentioned in section 4.1, a theory section was added to each chapter, and a description section was added to each puzzle. Distributions of users interaction with theory and descrip-tion secdescrip-tions is plotted in figure 4.14.

The figure 4.14 shows a continuous usage of both the theory and description sections; how-ever, at the puzzle P5.2, one can see a peak of usage of both the theory and description sec-tions. This indicates that users want to seek knowledge when they experience a challenging puzzle.

(34)

4.2. Data Analysis P0.0 P0.1 P0.2 P0.3 P1.0 P1.1 P1.2 P2.0 P2.1 P2.2 P3.0 P3.1 P4.0 P4.1 P5.0 P5.1 P5.2 P6.0 P6.1 P6.2 P6.3 P6.4 P6.5 P6.6 P6.7 P7.0 P7.1 P7.2 P7.3 P7.4 P7.5 P7.6 P8.0 P8.1 P8.2 P8.3 P8.4 P8.5 P8.6 P8.7 P8.8 0 2 4 6 8 10 12 14 16 18 Number of Users Both Theory Description None

Figure 4.14: Unique users that open the theory or the description sections

The figure 4.15 show all the individual open events triggered by a user opening the theory or description sections and if these were triggered before or after an incorrect solution was executed. An attempt is defined from the first interaction of a puzzle until a correct solution is submitted, an idle event is triggered, or the user navigates to another puzzle. In this figure, one can also see a peak at puzzle P5.2 as in figure 4.14. As seen in the figure users seems to seek more knowledge before trying a solution. Since no user could complete this puzzle, it is possible that the information provided in the theory and description was not sufficient to solve the puzzle.

(35)

4.2. Data Analysis P0.0 P0.1 P0.2 P0.3 P1.0 P1.1 P1.2 P2.0 P2.1 P2.2 P3.0 P3.1 P4.0 P4.1 P5.0 P5.1 P5.2 P6.0 P6.1 P6.2 P6.3 P6.4 P6.5 P6.6 P6.7 P7.0 P7.1 P7.2 P7.3 P7.4 P7.5 P7.6 P8.0 P8.1 P8.2 P8.3 P8.4 P8.5 P8.6 P8.7 P8.8 0 5 10 15 20 25 30 Number of Open Events Theory

Theory after incorrect solution Description

Description after incorrect solution

Figure 4.15: Open events before and after an incorrect solution

Usability

To calculate the usability metrics Completeness, Correctness, Effectiveness, Efficiency, and Relative Efficiency some basic metrics had to be extracted from the event data, the total active time, average task time, number of correct tasks, and number of incorrect tasks are shown in the table 4.4.

Table 4.4: Learnability data

User Number Total time Average Task Time Correct Tasks Incorrect Tasks

U1 1188 22 16 37 U2 1016 44 17 6 U3 417 42 7 3 U4 2646 43 31 30 U5 2019 58 11 24 U6 103 26 2 2 U7 686 33 14 7 U8 2547 24 25 82 U9 1000 22 21 24 U10 313 21 8 7 U12 11561 55 67 144 U13 326 18 17 1 U14 330 37 6 3 U15 832 30 15 13 U16 4621 48 39 57 U17 569 47 10 2 U19 766 77 5 5 U20 1098 61 16 2 Expert 789 16 42 7

(36)

4.2. Data Analysis

Table 4.5: Learnability metrics

User Number Completeness Correctness Effectiveness Efficiency Relative Efficiency

U1 2.68 30.19% 80.80 3.60 18.17% U2 1.36 73.91% 100.41 2.27 11.46% U3 1.44 70.00% 100.73 2.42 12.18% U4 1.38 50.82% 70.29 1.62 8.17% U5 1.04 31.43% 32.70 0.57 2.86% U6 2.34 50.00% 116.91 4.56 22.97% U7 1.84 66.67% 122.38 3.74 18.87% U8 2.52 23.36% 58.89 2.43 12.24% U9 2.7 46.67% 126.02 5.67 28.59% U10 2.87 53.33% 153.27 7.34 37.01% U12 1.1 31.75% 34.77 0.63 3.20% U13 3.31 94.44% 313.05 17.29 87.18% U14 1.64 66.67% 109.11 2.98 15.00% U15 2.02 53.57% 108.14 3.64 18.34% U16 1.25 40.63% 50.64 1.05 5.30% U17 1.27 83.33% 105.53 2.23 11.23% U19 0.78 50.00% 39.16 0.51 2.58% U20 0.98 88.89% 87.41 1.43 7.22% Expert 3.73 85.71% 319.41 19.84 100.00%

The calculated usability metrics for each user are shown in table 4.5. As mentioned in the method, Effectiveness and Efficiency can be seen as scores in this context and need to be compared to Effectiveness and Efficiency in a similar context. All metrics are considered as better the higher they are. Relative Efficiency ranges from 2.58% to 87.18%, the closer this metric is to 100%, the closer the user is to be an expert user of the web application.

(37)

5 Discussion

This chapter will cover the discussion of the results, method and the work in a wider context.

5.1 Results

This sections will present the discussion regarding the findings from the results.

Implementation

A large portion of the resources in this thesis was spent on creating a new version of the Coder Quiz web application and all of its features, most of which were excluded in the evaluation. The result from the implementation shows that executing code was shown to be critical since this means that several solutions to a puzzle are valid, and correct error messages can be provided to users as feedback. This also gives the user a real-life perspective of how the code would work if they would write it themselves.

Some parts that could be evaluated from the data collection are the decision to make Coder Quiz a progressive web application and the implementation of alternative control schemas for the puzzles. Even though most of the programming is done on computers, the overall usage of smartphones rises. The decision to make Coder Quiz a progressive web application can be backed up by the result of user sessions on mobile. They were a few mobile users, this indicates that there is interest, and it is reasonable to make the web application mobile friendly when it does not mean much extra work. The result also shows that the added alternative control schemes with hotkeys and click or touch controls were used intensively. They were not used as much as button controls but enough to justify the decisions to add them.

Data Analysis

With 20 users signed up, of which only 18 of them completed any puzzles, the user group was not nearly as big as anticipated. The low number of users affects the insights from the gathered data to only be viewed as guidance towards a conclusion, as the patterns in the

(38)

5.1. Results

data lack statistical significance. This thesis was conducted during the Covid-19 pandemic, which could explain the low number of users since the test group’s course was changed to be conducted remotely. Furthermore, the course start was delayed which affected the data collection time period. Coder Quiz was added to the course as an additional learning tool for students who want to practice more Python independently, but it was not mandatory in order to complete the course. A combination of the course changing to being a remote course and Coder Quiz being non mandatory could be a reason for the low engagement in the web application. As mentioned in the theory, students prefer working in their own time on programming course work when learning new concepts. That the course shifted to be performed remotely might have removed the need for an additional learning tool to use practice independently since all the course work can now be conducted independently. The data analysis was solely based on usage data from the web application; an additional qualitative data collection on users experience of the application might have been a good compliment to understand the true reason for this.

When it comes to completing puzzles, the most interesting part is that no user could success-fully complete puzzles P5.2 as seen in figure 4.13. An interesting aspect of this puzzle is that half of the users that interacted with that puzzle did not even try a solution. This shows that the data collection could be useful for other instances other than improving the web applica-tion. If this data was available to the teacher during the course, this could have been a great resource of what the students think is problematic and could be covered during the course’s teaching sessions. This data could also be used to improve quizzes and aid in making new quizzes in the future.

Reading Sections

The results suggest that the theory and puzzle description sections were not obsolete. One interesting aspect of the result is that the user tends to seek additional information regarding more challenging puzzles. The percentage of users that open the theory section or description section is increasingly higher as the user progresses through the puzzles as seen in figure 4.14. Most of the users that open the theory or description sections want to seek additional infor-mation before rather than after they submit a correct solution, as seen in figure 4.15. This indicates that users do not want to submit a solution that they think is incorrect. Instead, they seek additional information to improve their solution even though the code’s output could guide them towards a correct solution.

Usability

As suggested in the theory more frequent users of the web application should, have a higher score since the efficiency should rise over time as seen in figure 2.1. This is not the case in the result as the user U12 with the highest total time, and the highest number of correct tasks has one of the lowest efficiency scores. In relation, the user U13, who has the second-lowest total time on the web application with only 17 correct tasks, still has the highest efficiency score of all the users. U13 is able to achieve such high efficiency by having a low average task time together with high correctness. Similarly, does U12 get a low efficiency score due to the low percentage of correctness and high average time. U13 has a higher score than users with similar total time, such as U14. This could instead be an indication that U13 has more knowledge of Python programming.

The figure 4.13 shows that puzzles further into the quiz tend to be more complicated since a higher percentage of users who try to solve the puzzle cannot solve the puzzle correctly, leading to a higher failure rate and a higher average task time. Users will naturally have a higher average task time if they progress through the quiz, affecting the efficiency score.

Creating and Evaluating a Useful Web Application for Introduction to Programming

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science and Engineering

2020 | LIU-IDA/LITH-EX-A--20/076--SE

Creating and Evaluating a Useful

Web Application for Introduction

to Programming

Utveckling och utvärdering av en användbar webbapplikation

för introduktion till programmering

Daniel Johnsson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research Question

1.4

Delimitations

2

Theory

2.1

Introduction to Programming

2.2

Learnability

Measuring Learnability

High Learnability Drawbacks

2.3

Related Work

3

Method

3.1

Literature Study

3.2

Implementation

Firebase

Google Analytics

Quiz

Puzzle

Data Collection

3.3

Data Analysis

Users

User Sessions

Interaction Type

Completed Puzzles

Theory and Description Usage

Usability

4

Results

4.1

Implementation

Responsive Design

Executable Code

Quiz

Chapter

Puzzle

Indentation

Line Order

Comment Out

Missing Word

4.2

Data Analysis

Users

Sessions

Device

Interaction Types

Completed Puzzles

Theory and Description Sections

Usability

5

Discussion

5.1