• No results found

Examining the differences in code reading practices employed by junior and senior developers

N/A
N/A
Protected

Academic year: 2021

Share "Examining the differences in code reading practices employed by junior and senior developers"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2018,

Examining the differences in code reading practices employed by junior and senior developers

JONATHAN BROSTRÖM MÅNS NILSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

Examining the differences in code reading practices

employed by junior and senior developers

JONATHAN BROSTRÖM & MÅNS NILSSON

Master in Computer Science Date: June 5, 2018

Supervisor: Richard Glassey Examiner: Örjan Ekeberg

Swedish title: Undersöker skillnader i kodläsningspraxis som används av juniora och seniora utvecklare

School of Electrical Engineering and Computer Science

(3)
(4)

iii

Abstract

Reading code is an essential skill to have for developers, as it is an ef- fective way of finding bugs and improving upon the general quality of code. Despite this, it is a skill that usually only comes with expe- rience. Eventually developers will have read so much code they will have found what works and what doesn’t work in pieces of code. The aim of this report is to find out whether or not there are any differences in code reading techniques employed by experienced code reviewers and inexperienced ones. This could hopefully enable the invention of better ways for aspiring developers to learn this skill in an efficient manner. This was done by dividing participants into experienced and inexperienced code reviewers and having them read code. Their eye movements were then tracked through the use of eye-tracking tech- nology. The study was composed of four different tests. In two of these, the goal was to explain what the code does. In the other two, the goal was to find a number of bugs scattered throughout the code. The results showed that there were some slight differences in their focus:

most notably the more experienced group focused more on the key points of the program and focused their time better, while the focus of the inexperienced group was more scattered throughout the code.

(5)

iv

Sammanfattning

Att kunna läsa igenom kod är en viktig färdighet för utvecklare att be- mästra, eftersom det är ett effektivt sätt att hitta buggar och förbättra den generella kodkvaliteten. Trots detta är det en färdighet som van- ligtvis bara utvecklas med erfarenhet. Till slut har utvecklare läst så mycket kod att de lärt sig vad som fungerar och vad som inte funge- rar i koder. Målet med denna rapport är att ta reda på huruvida det finns några skillnader mellan erfarna och oerfarna kodgranskare i hur de läser igenom kod. Detta kan förhoppningsvis leda till bättre sätt för kommande utvecklare att effektivt lära sig denna färdighet. Detta gjordes genom användandet av eye-trackingteknologi där deltagare delades in i grupperna erfarna och oerfarna kodgranskare och tillfrå- gades att läsa igenom olika typer av kod. Resultaten från dessa tester användes sedan som grund för att avgöra vart gruppernas fokus skil- jer. Studien bestod av fyra olika tester. I två av dessa skulle deltagar- na beskriva vad koden gjorde. I de övriga två skulle deltagarna hitta ett antal buggar utspridda över hela koden. Resultaten visade att det fanns några små skillnader i gruppernas fokus: det mest anmärknings- värda är att den erfarna gruppen fokuserade mer på nyckeldelarna av programmen och utnyttjade sin tid mer effektivt, medan den oerfarna gruppens fokus var mer utspritt.

(6)

Contents

1 Introduction 1

1.1 Research question . . . 2

2 Background 3 2.1 Origin and history of eye-tracking methods . . . 3

2.2 Usage of eye-tracking in research . . . 3

2.3 Previous relevant studies . . . 4

3 Methods 6 3.1 Tools and software used . . . 6

3.1.1 Tobii 4C eye-tracker . . . 6

3.1.2 Unity . . . 6

3.2 Data gathering . . . 6

3.3 Participants . . . 7

3.4 Code samples . . . 7

3.5 Process . . . 16

3.6 Analyzing the data . . . 17

3.6.1 Duration times . . . 17

3.6.2 Correct answers . . . 17

3.6.3 Heat maps . . . 19

4 Results 21 4.1 Programming background form . . . 21

4.2 Results from the tests . . . 22

4.2.1 Explain code test 1 . . . 23

4.2.2 Explain code test 2 . . . 24

4.2.3 Bugfinding test 1 . . . 26

4.2.4 Bugfinding test 2 . . . 28

v

(7)

vi CONTENTS

5 Discussion 31

5.1 Explain code test 1 . . . 31

5.2 Explain code test 2 . . . 31

5.3 Bugfinding test 1 . . . 32

5.4 Bugfinding test 2 . . . 32

5.5 Conclusion . . . 33

5.6 Limitations . . . 33

5.6.1 Tobii Eye Tracker 4C . . . 33

5.6.2 Number of applicants . . . 34 5.6.3 Differences in preferred programming language . 34

Bibliography 35

(8)

Chapter 1 Introduction

Reading code is an essential skill for every developer to have, as code inspection has been shown to be a very efficient way to discover de- fects in code.[2] Reading code is a common way of learning how to code, but in a professional setting the reasons we read code often dif- fer from when we are learning. As professionals, developers often read code in order to find defects and whether or not there are improve- ments to be made to the code.[7] Experienced programmers can make more assumptions about different parts of code, especially if they’re covered by good variable and method naming, and are more trained in comprehending code quickly. Because of the varying levels of ex- perience and the differing reasons for why code is read, it is likely that the methods that beginner developers use when reading code dif- fer from the techniques employed by more senior developers. How- ever, as code reviewing techniques usually aren’t taught, and rather are learned through individual experience of debugging programs, it can be difficult for aspiring developers to learn this essential skill. It is therefore the aim of this report to find out whether or not such a dif- ference exists, in order for future generations of developers to be able to learn good code reviewing techniques in an easier way. To accom- plish this, eye-tracking technology will be used to find out how the focus of the two groups differs when they’re put into various different code-reviewing situations.

1

(9)

2 CHAPTER 1. INTRODUCTION

1.1 Research question

This report aims to identify and describe the differences in the vi- sual point of focus between experienced programmers and novice pro- grammers when reading code. By using eye-tracking technology to gather data from test participants in various code-reviewing situations, the intent is to conduct an independent series of experiments designed to form a thorough and satisfactory basis to investigate and answer our problem question, arrive at a conclusion based on our findings, and provide a clear statement on whether there is a difference in code- reading behavior between novice and expert programmers. Therefore, the main question that this report aims to address is the following question:

• How does the visual point of focus differ between experienced program- mers and novice programmers when reading code?

(10)

Chapter 2 Background

2.1 Origin and history of eye-tracking meth- ods

Eye-tracking technology has been employed in various forms for more than 100 years, and at the start was used for reading research. Since then, the methods used for tracking eye movements have improved greatly. Historically there have been multiple variants of eye-tracking methods, like electrodes mounted around the subjects’ eyes that would track eye movements through differences in electrical potential or large contact lenses with a metal coil that would detect fluctuations in an electromagnetic field when the eyes moved.[4] Today, most eye track- ers function by a camera being mounted close by the computer moni- tor and track the eyes through the video being recorded.

2.2 Usage of eye-tracking in research

In human–computer interaction research it is believed that what is be- ing looked at is the primary focus of the person’s mind.[4] This means that by tracking a person’s eye movements, one can find where their attention is being focused in relation to what is being shown on the screen. In addition to this, finding points where the eyes are being stationary can show how much processing is required to take in the information being provided at the location where the eyes are fixated.

In eye-tracking research, the main measurements being used are fixations and saccades. Fixations are points where the eyes remain

3

(11)

4 CHAPTER 2. BACKGROUND

stationary for a longer period of time, while saccades are rapid eye movements between fixations.[4] When measuring the eyes’ fixations, depending on the context a number of different things can be learned.

While doing some sort of encoding task, for example reading a text, fixations in a certain area can mean the information in said area is of high interest. However, in a search task these results are the opposite:

a high amount of fixations in a certain area are usually an indication that there is some degree of uncertainty in recognizing the thing being looked at. During saccades, the brain isn’t performing any encoding.

By measuring so called "Regressive saccades", which are eye move- ments where the eye is looking back to something that it was previ- ously looking at, difficulties in encoding can be measured.[4] Usually these saccades are very short, but in the case of longer saccades they can show confusion in processing the text.

The possibilities of gaining these types of knowledge from eye- tracking experiments have shown to be useful in code-reading research.

Past studies have used eye-tracking to study, among other things, how code scanning affects error detection, the level of linearity in the indi- vidual’s reading, as well as how code styling affects the comprehen- sion of code and the ability to detect errors in code.[1,5,3]

2.3 Previous relevant studies

In 2012, three students from two different universities in Ohio in Canada conducted a study on how the time taken for an individual to scan through code in its entirety affects the individual’s detection of errors in the code, and how the code-reviewing process is influenced by the individual’s experience. The study tested fifteen students studying computer science at an American university on four sets of C code with a couple of faulty lines, using a video-based eye-tracker to track the participants’ point of visual focus. The study concluded that the scanning time had a significant impact on the amount of focus on faulty code lines and the time taken to detect errors.[1]

In 2015, seven university students from four different countries and an individual from Microsoft Research teamed together to study the linearity of reading code compared to reading natural language text, among two groups: expert and beginner programmers. The study tested twenty individuals: fourteen beginners and six experts. The

(12)

CHAPTER 2. BACKGROUND 5

test participants were given short sets of Java code and natural lan- guage text to read through while their visual focus was tracked using a video-based eye-tracker. The results showed that the beginners read code more linearly than the experts, and that the beginners read code less linearly than natural language text. From this they concluded that there are differences between reading natural language text and reading code, and that non-linear reading skills improve with experi- ence.[5]

A 2017 study, conducted by two KTH students, studied the effects of code styling on the interpretation and readability of code. The study tested participants with varying amounts of programming experience.

The participants were given a couple sets of Java code with or with- out styling techniques applied to examine while a video-based eye- tracker tracked their point of visual focus. The study claims the ex- periments showed that the usage of code styling had an affect on how the participants addressed their assignments. The study concluded that code commenting and good naming of variables and methods im- proved the interpretation and readability of code. Code commenting was found to make the participants more thorough in examining the signature of a method, and subsequently better at discovering return- type errors inside the method. Good variable and method naming was found to help the participants understand the purpose of variables and functions through self-explanatory names instead of having to figure that out for themselves and, perhaps, read through large amounts of code. Any improvements with syntax highlighting and code indenta- tion could not clearly be established.[3]

(13)

Chapter 3 Methods

3.1 Tools and software used

3.1.1 Tobii 4C eye-tracker

The Tobii 4C is part of a line of eye trackers with a primary focus on use in video games. It was used to plot the participants’ visual point of focus during the code-reviewing sessions. It’s a corded peripheral for Windows operating systems and operates at a frequency of 90 Hz, meaning that it records pairs of eye fixation coordinates at a rate of 90 times per second.[6]

3.1.2 Unity

The software tool Unity was used in order to get and record the gaze coordinates from the eye tracker for each individual and test, as well as to show the code samples on the screen. It was chosen since Tobii provides an SDK for their eye trackers for use in Unity.

3.2 Data gathering

The data used as our basis for investigating and answering our ques- tions was gathered through a series of eye-tracking experiments with various participants who read various samples of code and tried to understand what the code was doing and identify problematic code while their visual point of focus on the screen was tracked and plotted using an eye-tracker and the software tool Unity. We also clocked the

6

(14)

CHAPTER 3. METHODS 7

duration time of each test for each individual. The number of correct answers out of what was supposed to be addressed by the individual in any specific test was counted.

3.3 Participants

Eight individuals participated in our experiments. The individuals had varying levels of programming experience prior to the code-reviewing sessions.

Before conducting the experiments, each participant was asked to answer a short form detailing their previous programming experiences, their preferred languages as well as whether or not they had any ex- perience of performing code reviews in a more professional setting rather than simply looking through their own code looking for bugs.

The results of this form were then used to divide the participants into two groups, one consisting of participants with aforementioned pro- fessional experience and one group with those without.

3.4 Code samples

The code suite consisted of two samples of code written in C++ and two samples of code written in Java. For each programming language, one sample contained code with good variable and method naming, and the other sample contained code with bad naming conventions.

We chose not to write any comments in the code samples for a number of reasons. We wanted to present cleanly written normal-sized code in its entirety to the participants on a limited-size screen. We wanted a fair amount of actual programming code to ensure the meaningfulness of our experiments. Additionally, every coherent part of the samples was fairly limited in scope and complexity. We start by explaining the Java code samples and then move on to the C++ samples.

(15)

8 CHAPTER 3. METHODS

Figure 3.1: The first Java code sample presented to the participants The first Java sample has bad naming conventions. It first creates

(16)

CHAPTER 3. METHODS 9

a new instance of the class Example to gain access to its member func- tions since the main function, which is called on execution, cannot call any functions outside itself since it’s static. It looks at the number of arguments passed to the program execution and prints out the result of calling the Example method foo() with the first argument converted to an integer if at least one argument was passed, and otherwise prints

"No argument provided". foo(i) prints out the result of calling bar(j) for all j integers congruent to 1 modulo 3 that are non-negative and less than the integer i. bar(j) calculates the j:th Fibonacci number iteratively, where the case j == 0 returns the first Fibonacci number. Overall, the program prints out all Fibonacci numbers whose order is congruent to 1 modulo 3 in the sequence if its order is less than the first argument passed to the program execution if it exists, and otherwise prints out a short message explaining why no sequence of Fibonacci numbers can be calculated.

(17)

10 CHAPTER 3. METHODS

Figure 3.2: The second Java code sample presented to the participants

(18)

CHAPTER 3. METHODS 11

The other Java sample has good naming conventions. It calculates the average of the first 21 Fibonacci numbers, converts this to the high- est integer not greater than the average, checks whether this is a prime number and then prints out this result. It does this by first calling fi- bonacci(20), which returns a list of the first 21 Fibonacci numbers, stor- ing this into the variable numbers, initiating the variable sum to zero, it- erating through every element of numbers and for every such element adding its value to sum, calculating the average into the new variable avg by dividing sum by the number of elements in numbers, and then printing out the result of calling isPrime(avg). isPrime(avg) converts avg into the highest integer not greater than avg and returns true if this integer is a prime number and false otherwise.

(19)

12 CHAPTER 3. METHODS

Figure 3.3: The first C++ code sample presented to the participants The first C++ sample has bad naming conventions. It keeps gener-

(20)

CHAPTER 3. METHODS 13

ating a random number until it’s between 2 and 8, at which point it’s printed out. The program then tries to calculate the remainder when 2 is divided by 8 through iterative subtraction before then printing it out. The program then tries to calculate and print out the floating- point value of dividing 2 by 8, before attempting to calculate and print out the sum of the first ten positive integers. Lastly, the program cre- ates a map with the key "a" mapped to the value 18 and the key "b"

mapped to the value 54, and then prints out the key of the map ele- ment with the value 18 by iterating through the map in the function g.

There are seven bugs in this program. First, the variable whose value is returned in the g() method is not initialized at declaration, and the only point at which this variable is initialized is when the iter- ation encounters an element in the passed map with the passed value.

However, this may never happen. Therefore, the function could return the value of an uninitialized variable. This is solved by initializing the variable upon declaration. Second, the loop which tries to generate a random number between 2 and 8 may go on forever, since it gener- ates an integer within the entire range of possible int-type values. It is not necessary to generate more than one completely random number.

Instead, by generating just one such random number, one can apply modulo (8 - 2) to this number and then add 2 to this modulo result to get a random number between 2 and 8. The three following bugs concern the remainder calculation. The third bug is the fact that the re- mainder variable is not initialized to 2 before the subtraction loop. The fourth bug is the fact that the loop condition is placed at the end of the loop instead of the beginning, meaning that despite the fact that the remainder is already less than 8 upon initialization and thus should be printed out, the program performs a subtraction of 8 from it. The fifth bug is the fact that the comparison should check whether the re- mainder is less than the divisor’s variable, b, instead of the dividend’s variable, a. The sixth bug, concerning the iteration through the list of numbers from 1 to 10, is that it iterates through all element indexes from 1 to the number of elements in the list, even though list elements are indexed from 0 and upwards. This misses the very first element of the list and causes an out-of-bounds exception. This is solved by changing the loop’s initialization value to 0 and removing the equals sign for the upper bound, to ensure that it iterates through all indexes strictly less than the number of elements in the list. The seventh and

(21)

14 CHAPTER 3. METHODS

final bug is the fact that the attempt to add the value of each element in the list to the total sum in the same loop actually assigns the value of the current element to the value of the sum in each iteration. Therefore, the sum that is printed out would be the value of the last element. This is fixed by adding a plus sign before the equals sign in this statement.

(22)

CHAPTER 3. METHODS 15

Figure 3.4: The second C++ code sample presented to the participants

(23)

16 CHAPTER 3. METHODS

The other C++ sample has good naming conventions. It defines the variable enableoutputtext which is meant to enable text printouts if and only if its value is set to true. The program then creates a list of 30 ran- dom numbers and then tries to sort it with quicksort before intending to print out the highest-valued number from the list if enableoutputtext indicates that this should happen.

This program has 4 bugs. The first bug is the fact that the lower and upper bounds for the element indexes in the list that are passed into the quicksort() method call are both off by one. The lower bound, 1, misses the first element, and the upper bound, the number of elements in the list, tries to include an element which does not exist in the list.

They should both be decreased by one. The second bug is the fact that the list parameter in quicksort() passes the argument list by value rather than reference, causing the original list to remain non-sorted af- ter quicksort() has returned. The same error is present in the partition() function, and since this is the function which actually modifies the ar- gument list, it is not enough to correct this for quicksort(). The & sign needs to be added in front of the list parameter variable name for both quicksort() and partition() in order for quicksort() to modify the original list passed to quicksort(). The third bug is inside partition() and happens due to an erroneous swap of two different elements. The present im- plementation correctly sets the first element to equal the second, and then erroneously sets the second element to equal the first, which is the value of the second element that we started with. This is solved by storing one of the swap elements in a new temporary variable at the beginning of the swap sequence, and then setting the other swap element to equal the temporary variable’s value at the end of the swap sequence. The fourth and final bug concerns the fact that the condition for printing out the highest-valued number has an erroneous exclama- tion point before the condition variable, thus inverting the semantic meaning of the variable, contradicting the variable’s name. This ex- clamation point should be removed to match the semantic meaning of the variable’s name.

3.5 Process

Each of the participants was fully tested individually, one at a time. All code-reviewing sessions were conducted with only the writers of this

(24)

CHAPTER 3. METHODS 17

report and the current participant themselves present in a quiet room.

The eye-tracker was mounted at the top of the laptop screen’s con- tainer. To prepare the eye-tracker for tracking the participant’s eyes, the participant first had to calibrate their eyes to the eye-tracker, which took between 3-5 minutes.

Each participant was first given the two Java code samples, before being given the two C++ samples. The samples were presented in their entirety to the participant on the laptop’s screen as normal-sized text in pre-rendered PNG screenshot images from a syntax-highlighting text editor, to present a static and consistent visual environment with no chance of distraction from a visible blinking cursor or accidental selec- tions or manipulations of parts of the code, and also because syntax- highlighting text editors are becoming increasingly popular for pro- grammers, even the novice ones. The participant reviewed one code sample at a time. For any code sample, the participant started plotting eye-tracker data for that sample by pressing the space key on the key- board. During any code review, the room remained quiet and no inter- nal or external distractions occurred. The participant then pressed the space key again to end the current code review once they felt that they knew how to address the tasks described for that particular review, be- fore then addressing them. For the first two samples, the participant was asked to describe what the code was doing. For the other two sam- ples, the participant was asked to locate and describe the problems in the code.

3.6 Analyzing the data

This section goes over the methods that were employed to analyze the gathered data in order to solve the problem questions.

3.6.1 Duration times

The duration times were used to calculate the group-specific average, fastest and slowest times spent on each test.

3.6.2 Correct answers

Data about the number of correct answers was used to calculate the group-specific average percentages of correct answers out of what was

(25)

18 CHAPTER 3. METHODS

supposed to be addressed in any specific test. For the bugfinding tests, this data was also used to calculate the group-specific maximum and minimum numbers of bugs found.

(26)

CHAPTER 3. METHODS 19

3.6.3 Heat maps

Figure 3.5: An example of a heat-map image produced by our own Java program.

(27)

20 CHAPTER 3. METHODS

For the purposes of analyzing the eye-tracking data, a Java program was created which plotted all points of fixation for each code sample image and saved this to two new images: one combined result for ex- pert programmers, and one combined result for novice programmers.

Each point was colored according to the number of times the partici- pants focused on that particular point. Points with brighter coloring represent points which the participants focused more on. These im- ages, called heat-map images, provide an efficient overview for iden- tifying the overall amount of attention and differences in the overall amount of attention received to different parts of the code. Therefore, these images make it possible to analyze differences and consistencies in the overall amount of attention received to different parts of the code samples.

(28)

Chapter 4 Results

4.1 Programming background form

Four of the eight participants answered that they had prior experience of performing code reviews in a more professional setting than looking through their own code for bugs. Therefore, four participants were placed in the experts’ group and the remaining four were placed in the beginners’ group.

Figure 4.1: The participants were asked how many years of experience they possessed, with the most common answers being 4-5 and 5+

21

(29)

22 CHAPTER 4. RESULTS

Figure 4.2: The participants were asked what/which languages they felt most comfortable in, with Python being the most common fol- lowed by Java, C++ and JavaScript

4.2 Results from the tests

This section presents the results from the different tests: the average time, as well as the fastest and slowest times for the two groups, the percentage of correct answers out of what the participants were sup- posed to find, as well as the eye-tracking data collected, combined into one image per group and test. For the bugfinding tests, the maximum and minimum numbers of bugs correctly found were also calculated.

(30)

CHAPTER 4. RESULTS 23

4.2.1 Explain code test 1

Group Average time Fastest Slowest Correct

Experienced 169.75 143 195 50%

Inexperienced 145 72 287 50%

Table 4.1: Results for the first test. The two groups’ average time, as well as the fastest and slowest times and the percentage of correct an- swers

Figure 4.3: The combined eye tracking result for the experienced group for the first test

(31)

24 CHAPTER 4. RESULTS

Figure 4.4: The combined eye tracking result for the inexperienced group for the first test

4.2.2 Explain code test 2

Group Average time Fastest Slowest Correct

Experienced 111.25 95 135 100%

Inexperienced 111.50 61 217 100%

Table 4.2: Results for the second test. The two groups’ average time, as well as the fastest and slowest times and the percentage of correct answers

(32)

CHAPTER 4. RESULTS 25

Figure 4.5: The combined eye tracking result for the experienced group for the second test

(33)

26 CHAPTER 4. RESULTS

Figure 4.6: The combined eye tracking result for the inexperienced group for the second test

4.2.3 Bugfinding test 1

Group Average time Fastest Slowest Correct Most found Least found

Experienced 307.25 237 354 33.33% 4 0

Inexperienced 317 162 455 16.67% 2 0

Table 4.3: Results for the third test. The average time, the fastest and slowest times as well as the average percentage of bugs found and the most and least found out of a total of 6

(34)

CHAPTER 4. RESULTS 27

Figure 4.7: The combined eye tracking result for the experienced group for the third test

(35)

28 CHAPTER 4. RESULTS

Figure 4.8: The combined eye tracking result for the inexperienced group for the third test

4.2.4 Bugfinding test 2

Group Average time Fastest Slowest Correct Most found Least found

Experienced 325.25 267 386 37.5% 2 1

Inexperienced 328.25 159 525 31.25% 2 0

Table 4.4: Results for the fourth test. The average time, the fastest and slowest times as well as the average percentage of bugs found and the most and least found out of a total of 4

(36)

CHAPTER 4. RESULTS 29

Figure 4.9: The combined eye tracking result for the experienced group for the fourth test

(37)

30 CHAPTER 4. RESULTS

Figure 4.10: The combined eye tracking result for the inexperienced group for the fourth test

(38)

Chapter 5 Discussion

5.1 Explain code test 1

In the first test, only half of the participants managed to correctly an- swer what the function of the code was. Most people who got it wrong failed to recognize the Fibonacci algorithm. By looking at the eye- tracking data from the two groups, it can be clearly seen that not much time was spent looking at the main function. This is unsurprising, as it doesn’t do very much: it simply calls the bar function, and checks that there was an argument input when the program was run. One thing that differed between the two groups was how the bar function was approached. The more experienced group spent most of their time looking at the for loop at the end of it. One possible explanation for this is that as the Fibonacci function is a common occurrence in school courses, it is not commonly found while working. This means that the more novice developers who are still studying recognized the function immediately, while the other group had to look through the function more carefully before spotting what it does. This would also explain why the inexperienced group had a faster average time to complete it, the time difference between the fastest, who got the question right, and the slowest, who didn’t, is much larger than that of the other group.

5.2 Explain code test 2

The code for this test was in large part similar to that of the previous test, which would explain why the time to complete it went down for

31

(39)

32 CHAPTER 5. DISCUSSION

both test groups, as well as all of them managing to answer it cor- rectly this time. However, the average time for the experienced group went down by almost 30 seconds more than the other group. This is likely because of the introduction of descriptive function names, as this time they didn’t have to spend time in order to recognize the Fibonacci function. Neither group spent very much time looking at the isPrime function, as it is a very straightforward function that does nearly ex- actly what it says. The less experienced group took this to an even further extent than the other group. The reason as to why this was the case is hard to say, one possible explanation could be that they took for granted that the function would properly do what it said, while the experienced group wanted to make sure it was correct.

5.3 Bugfinding test 1

In this test the experienced group had a much higher success rate, with the average bugs found being double that of the other group. Consid- ering the difference in experience however, this is hardly surprising.

The difference in the eye-tracking data in this test is not that large, as it was mostly a number of smaller code snippets, rather than a larger and more cohesive program like the other 3 tests. Both groups largely ignored the parts of the print messages that were static strings, as well as most of the type declarations of variable initializing statements. The largest difference between the two groups lies in the g function, where the experienced group took a lot more time analyzing the function.

5.4 Bugfinding test 2

In this test, the inexperienced group performed far better compared to the previous test. However, this was partly as one of the bugs, namely the one at the very end, which is more of a semantics error rather than a runtime bug, and almost all participants across both groups man- aged to spot it, bringing up the average by a lot. The largest difference between the two groups’ focus lies in the partition function, where the experienced group spent a lot more time looking compared to the in- experienced group, who spent more time looking at the main function.

This can likely be attributed to difference in experience, as the partition function in Quicksort is the biggest part of the algorithm, and therefore

(40)

CHAPTER 5. DISCUSSION 33

it is smart to assume there could be an issue there.

5.5 Conclusion

Using this data collected we can now answer the research question:

• How does the visual point of focus differ between experienced program- mers and novice programmers when reading code?

While the difference between the two groups was not as large as ini- tially suspected, there were some key differences. Most notably, the inexperienced group’s focus was much more spread out, whereas the more experienced developers were more focused on key parts of the program, which led to them, despite some unfamiliar algorithms, get- ting an overall better result. Often these key parts of the program came in the form of recursive functions or loops that were iterated fre- quently. Because the results showed a rather small difference between the two groups, it is hard to say how these results would be used for further education on the subject, but they do give some pointers on where work could begin. More specifically they can to some degree be used to teach beginners on the complexity of different parts of code and their importance. Most likely more research would be required before a definitive way to learn code reviewing could be developed.

5.6 Limitations

There were a number of limitations in this study which are brought up here, that could be improved upon in future research.

5.6.1 Tobii Eye Tracker 4C

The 4C is part of Tobii’s gaming line of eye tracker, which means it is not as accurate as other more research-focused products. It is hard to say to what degree this affected the study, as we didn’t have access to another eye tracker to compare it to. Another issue with this model is that we lacked the license used for research, which means we did not have access to the more detailed data that would be provided if the better license were available. Because of this, more thorough examina- tion of the process of code reviewing was not possible.

(41)

34 CHAPTER 5. DISCUSSION

5.6.2 Number of applicants

As the tests took a bit of time to complete, as well as the fact that it usu- ally had to be carried out during work/school hours, there were some issues with finding applicants, as only 8 applicants signed up across 4 different sessions. More participants could probably have been found given more time, but due to unforeseen circumstances regarding ac- quisition of the eye tracker amongst others, this was not feasible within the time constraints of this project.

5.6.3 Differences in preferred programming language

Because of the difficulties in finding willing participants, finding ap- plicants across the two groups with similar experiences in what pro- gramming languages they had previously worked with was not a pos- sibility. Having participants with similar experiences may have given a better result, as an unfamiliarity with the presented language may have impacted the way they approached the task at hand.

(42)

Bibliography

[1] Michael Falcone Bonita Sharif and Jonathan I. Maletic. “An Eye- tracking Study on the Role of Scan Time in Finding Source Code Defects”. In: (2012), pp. 1–2.

[2] Alastair Dunsmore, Marc Roper, and Murray Wood. “Further in- vestigations into the development and evaluation of reading tech- niques for object-oriented code inspection”. In: Proceedings of the 24th international conference on Software engineering. ACM. 2002, pp. 47–57.

[3] Emil Persson and Leif Tysell Sundkvist. “Code Styling and its Ef- fects on Code Readability and Interpretation”. In: KTH’s publica- tion database DiVA (2017), pp. 1–12.

[4] Alex Poole and Linden J Ball. “Eye tracking in HCI and usability research”. In: Encyclopedia of human computer interaction 1 (2006), pp. 211–219.

[5] Roman Bednarik et al Teresa Busjahn. “Eye Movements in Code Reading: Relaxing the Linear Order”. In: 23rd International Confer- ence on Program Comprehension (2015), pp. 1–4.

[6] “Tobii Eye Tracker 4C – The game-changing eye tracking periph- eral”. In: (2017).

[7] Yang-ming Zhu. “Code Reading Techniques”. In: Software Reading Techniques. Springer, 2016, pp. 103–118.

35

(43)

www.kth.se

References

Related documents

We recommend to the Annual General Meeting of Shareholders that the Income Statements and Balance Sheets of the Parent Company and the Group be adopted, that the profi t of the

And there´s not much to say about the weather either – the whole coastside of USA will get clear skies and sunny weather?. throughout the day with temeperatures reaching 70

In the rural areas it is often the young, able-bodied persons who migrate to the urban areas to study or to look for work and the older generation that are left behind to tend to

Indeed, the good enough analysis seems to be used frequently during reading, as evidenced by the many cases in which semantic incongruities, anomalies, and illusions remain

What I wanted to do was to create a space where the participants (instead calling them visitors) could in different ways try to make sense of the digital in their everyday. To

Aims: The overall aims of this thesis were to evaluate the dual-energy X-ray and laser (DXL) method for bone densitometry measurements of the calcaneus in children, to provide

It also explores and discusses the main question of how, in the process of designing a luminaire, product and lighting designers could make use of the visual quality differences

Swedenergy would like to underline the need of technology neutral methods for calculating the amount of renewable energy used for cooling and district cooling and to achieve an