CHRISTINASUNNEGÅRDHANDKLARAESERSTAM Intention-revealingfunctionnamesandsmallfunctionstofacilitatecodecomprehension

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2019,

Intention-revealing function names

and small functions to facilitate

code comprehension

CHRISTINA SUNNEGÅRDH

KLARA ESERSTAM

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

(2)

(3)

Intention-revealing function

names and small functions to

facilitate code

comprehension

CHRISTINA SUNNEGÅRDH AND KLARA ESERSTAM

Master in Computer Science Date: June 7, 2019

Supervisor: Jeanette Hällgren Examiner: Örjan Ekeberg

School of Electrical Engineering and Computer Science Swedish title: Intentionsavslöjande funktionsnamn och korta funktioner för att underlätta kodförståelse

(4)

(5)

iii

Abstract

Code comprehension is by many considered to be one of the most expensive and time-consuming phases of the software life cycle. There are multiple techniques for making code more comprehensible, one of them alleged to be keeping functions small. However, the claim that small functions are superior to large functions with regards to code comprehension, are often based upon programming experience and stated without references from research. In these claims the importance of intention-revealing function names to improve code comprehension is also emphasized. It could therefore be questioned whether it is keeping functions small or the enabling of intention-revealing function names that underlies the claim of superiority of small functions. This thesis presents previous studies and relevant literature within the area as well as carries out tests using eye tracking. The results of the tests indicated that intention-revealing function names have a significant effect on how fast developers comprehend code. It is also indicated that simply splitting up a function, without using intention-revealing function names, increases the chance for programmers to be able to tell the correct output. To be able to draw reliable conclusions, further studies would be necessary, where the most important improvements would be to provide an unlimited amount of time for each test as well as a larger quantity of test persons.

(6)

iv

Sammanfattning

Kodförståelse anses enligt många vara en av de mest tidskrävande faserna i en programvaras livscykel. Det finns flera tekniker för att göra kod mer begriplig, som till exempel att hålla funktioner korta. Påståendet att korta funktioner är överlägsna långa funktioner med avseende på kodförståelse, är ofta baserat på erfarenhet inom programmering och inte grundat i forskning. I dessa påståen- den betonas även vikten av att använda intentionsavslöjande funktionsnamn för att främja kodförståelse. Man kan därför fråga sig om det är själva funk- tionslängden eller möjligheten att använda intentionsavslöjande namn som lig- ger bakom påståendet att korta funktioner är fördelaktiga. Denna rapport tar upp tidigare studier och relevant litteratur inom området, samt presenterar re- sultat från tester menade att undersöka denna fråga. Resultaten från testerna pekar på att intentionsavslöjande funktionsnamn har en signifikant betydel- se för hur snabbt utvecklare begriper kod. Det finns även en antydan till att endast dela upp en funktion, utan att använda intentionsavslöjande funktionsnamn, ökar sannolikheten att en programmerare förstår vad funktionen gör.

För att kunna dra säkra slutsatser bör ytterligare studier genomföras, där testerna utförs utan tidsbegränsning och på fler testpersoner än i denna studie.

(7)

Chapter 1 Introduction

Program comprehension, also code comprehension or software comprehension, is about understanding how a software system or part of it works, as well as the tools that can be used to facilitate comprehension. It is considered by many to be one of the most expensive and time-consuming phases of the software life cycle (Nedhal 2017). An estimation is that it takes up more than half of the time spent of software maintenance (Zelkowitz et al. 1980). Therefore, it is motivated to investigate the tools and techniques that can be used when writing code to simplify for those reading and using the code in the future.

Most programmers have poor code comprehension and find it easier to write code than to read code (Spolsky 2000). One reason for this is that when reading a program, a lot of information must be remembered to understand the context. When writing, the programmer simply needs to keep track of the information related to the feature they are currently working on. One could reason that this causes the programmer to reflect less over the code when writing it, and that they need to be motivated to spend time on applying code comprehension techniques in their code. Techniques for making code comprehensible are for example syntax highlighting, indentation, and commenting. Another technique is to extract code into new methods, and thereby creating smaller functions. If code needs a comment to be understood, one can probably extract a new method from it. An advantage of method extraction is that the new methods can explain themselves as long as some thought is put into the naming. Method extraction should be done if it provides clarity (Fowler et al.

1999).

1

(10)

2 CHAPTER 1. INTRODUCTION

1.1 Problem statement

The superiority of small functions, with regards to code comprehension, are brought up in the book “Clean Code” by Martin (2008). He implies that functions should be so small that they only do one task. Martin advocates small functions based upon his programming experience and adds that he cannot supply references from research to justify this assertion. However, arguments such as “shorter methods are easier to understand” does not seem convincing enough to keep methods short in practice (Göde 2014) and it is commonly discussed in what ways small functions can facilitate code comprehension and therefore maintenance.

In another chapter of the book "Clean Code", Martin also advocates the importance of "intention-revealing" names, especially regarding function names.

He elaborates that a function name should answer why it exists, what is does and how it is used. Martin claims that this is easy to accomplish when the functions are small and only does one thing (Martin 2008). It could therefore be questioned whether it is keeping functions small or the enabling of using intention-revealing names that underlies the claim that small functions are superior to large functions, with regards to code comprehension.

1.2 Research Question

To motivate programmers to consider how others understand their code it is necessary to prove the effects of code comprehension techniques. Therefore, the goal of this thesis is to present if and why one of these techniques, extract- ing code into smaller functions, effects code comprehension.

As dividing larger functions into smaller functions allows the programmer to give intention-revealing names to functions, our research question will therefore be formulated as following: Does the claim that small functions are superior to larger functions, with regards to code comprehension, exclusively depend on the enabling of using intention-revealing function names?

1.3 Limitations

Due to restricted resources and time, this thesis will have some limitations regarding test persons and technology. One of these limitations is that it will only include test persons who have studied or are studying computer science at uni- versity level. This to ensure that all test persons have basic programming skills

(11)

CHAPTER 1. INTRODUCTION 3

and thereby can be addressed as developers. Apart from the techniques for making code more comprehensive, mentioned in section 1, programmers have the option to use some sort of Integrated Development Environment (IDE) to improve comprehension even further. Another option is to run one method at the time to find out the specific task of that function. However, due to techni- cal limitations and keeping the tests as simple as possible, this thesis will only consider the comprehension while looking at code, without interacting with it.

(12)

Chapter 2 Background

This chapter will present some main concepts in this thesis as well as previous literature and studies that have been conducted within the area.

2.1 Main concepts

This section will introduce some main concepts that will be used throughout this thesis.

2.1.1 Function, procedure or method

When researching the subject of this thesis, three different terms for sub- routines are recurring. These terms are function, procedure and method. While a function refers to a sub-routine that returns a value, a procedure only has side effects. A method is a sub-routine that is executed in the context of an object and there are both function methods and procedure methods. However, they all refer to a repeatable piece of procedural code that can be called by name (Brett 2013) and will therefore be used interchangeably in this thesis.

2.1.2 Eye tracking

A computer that is equipped with an eye tracker can log where and at what time a user looks at the screen, a concept which is called eye tracking. This is done by capturing high-frame-rate images of the user’s eyes with the eye tracker’s sensor (Tobii 2019b). Tobii Technology is a Swedish company founded in 2001 that develops products for both eye tracking and eye control (Tobii 2019a).

4

(13)

CHAPTER 2. BACKGROUND 5

Eye tracking can be used together with engines such as Unity, which is a real- time engine used to create games and interactive experiences in 2D or 3D (Unity 2019), to make use of or gather data about the users’ eye movements.

2.1.3 Code comprehension

Code comprehension, or program comprehension as it is commonly called, can both refer to theories about how developers comprehend software, as well as tools that are used by developers to help them comprehend software (Storey 2005). In this report code/program comprehension refers to the former. Pro- gram comprehension is studied by giving a subject code and measure how well they understand the program (Shneiderman 1976). There are several approaches that can be used to measure program comprehension by fill-in-the- blank or multiple-choice questions, and one of the approaches is to ask for the output of the program. Apart from only measuring by correct answer, the time it takes for the subject to give the correct answer can also be helpful to measure program comprehension (Shneiderman 1976).

Recently new methods have been introduced for measuring program comprehension. Siegmund et al. (2014) used functional magnetic resonance imaging (fMRI) and explored how it can be used to understand how source code is understood. Eye-tracking as a modern technique has also enabled a new means of measure, as explored in a study by Bednarik & Tukiainen (2006).

Some advantages they saw compared to traditional studies were that the subjects’ ability to express their thought verbally does not play a part in the results, and their thought process is not interfered with.

2.2 Previous literature

This section will outline some literature and previous studies on how variable names and function length affects code comprehension.

2.2.1 Variable names

In 2017, an empirical study was conducted about effects of variable names for code comprehension (Avidan & Feitelson 2017). In the study, professional developers were given different functions to understand, where some of the functions had the variable names replaced with meaningless single letters. The study found that parameter names are more significant for comprehension than

(14)

6 CHAPTER 2. BACKGROUND

names of local variables. However, it was also found that there was no significant difference between the control- and experiment group for some of the methods, as a result of poor and misleading variable names. As these variable names were not intended to be misleading, the discussion also acknowledge that a name one developer think is meaningful can be misleading to another developer (Avidan & Feitelson 2017).

A similar study to the one by Avidan & Feitelson (2017) compared the use of single letter, abbreviated and full-length names of identifiers, with the hypothesis that full English-word identifiers lead to better program comprehension (Lawrie et al. 2006). They found that full word identifiers were superior to single letter ones when evaluating code comprehension by how well test subjects could describe the code and how confident they were in their understanding. They also found that in many cases abbreviations were as understandable as full names.

A study from 2018 measured the time spent on program comprehension by developers in several programming sessions (Xia et al. 2018). In this study, they found several different factors that leads to an increased program comprehension time, such as no comments or insufficient comments, meaningless names, inconsistent code styling and a large number of lines of code in a class or method. It was found that in 21 percent of 200 programming sessions, inconsistent coding style was the cause of long program comprehension time.

An example was mixed use of camelCase and under_score, which was a result of several developers working on the code without a strict naming con- vention. Meaningless naming of classes/methods/variables was also found to slow down program comprehension, and nine out of ten interviewees mentioned it as a reason for comprehension difficulties as the semantic meaning of the identifiers was hard to understand. During the sessions developers was found tracing back to the original definition statement when an unclear name was used.

2.2.2 Function length

The ultimate function length, according to Martin (2008) is just two to four lines of code. However, he suggests that the main focus should be that the function only does one thing. Fowler et al. (1999) care for function length to an even smaller extent, and states that a new method should be written whenever it is necessary to comment something. This method should contain the same thing, but instead of a comment describing how it is done, the new method should have a name which explains its purpose. Therefore, the length of the

(15)

CHAPTER 2. BACKGROUND 7

method is not that important, but rather that the name says exactly what the function does (Fowler et al. 1999).

In Object-Oriented Metrics in Practice by Lanza & Marinescu (2006) sev- eral identity disharmonies are brought up. They define them as design flaws which negatively affects a single entity, such as class or method, when it is considered on its own. To avoid identity disharmonies, and reach identity har- mony, they suggest that operations and classes should neither be too large nor too small. This unlike Martin who claims that you can always make a function smaller, unless the new smaller function has a name which simply restates the original code without making it less complex (Martin 2008). Lanza &

Marinescu (2006) argues that while it is desirable with short methods, it could obstruct maintenance if the code is split up too much.

An identity disharmony mentioned by Lanza & Marinescu (2006) is Brain Method, which is a method that controls too much of the functionality in a class. They often start as small methods but is built on with more functionality until it becomes hard to understand and maintain. Other consequences of Brain Methods are that they become hard to debug and most likely impossible to reuse.

As mentioned in the previous section, the study by Xia et al. (2018) brings up large classes and methods as a reason behind program comprehension difficulties. When developers that took part in the study were interviewed, some stated that large classes or methods often has complex logic and therefore are hard to understand. If a class or method has many lines it could be classified as a God method, meaning that the method controls too many processes in the program. A common way to fix this is by splitting the functionalities of the method into several sub-methods (Xia et al. 2018).

Davis et al. (2011) performed a study where they investigate how modular- ization affects the architectural design process in terms of how well the scripts are understood. Think-aloud interviews were performed on participants as they read both modularized scripts and unstructured scripts. The conclusions that were drawn was for example that participants that were unfamiliar with the scripts had more trouble understanding the unstructured ones. When reading the modular scripts participants relied more on the name and structure of the script rather than the documentation, indicating that self-explanatory modules are preferred before documentation.

(16)

Chapter 3 Methods

This chapter will describe the tests that were performed to analyze our research question. It will also describe the collaboration with other groups and how the tests have been based upon a previous work within the area, a student thesis from 2017 by Leif Tysell Sundkvist and Emil Persson.

3.1 Collaborations

During spring 2019, there were in total five groups at The Royal Institute of Technology (KTH) assigned to carry out theses in similar studies. The groups were all analyzing different aspects of code comprehension using eye trackers at the Visualization Studio VIC in Stockholm. Due to this, the groups had to share resources (computers and eye trackers) as well as test persons. The collaboration was also motivated by the fact that a previous study, carried out by Tysell Sundkvist & Persson (2017), were not able to collect a satisfactory amount of test persons. The tests, outlined in section 3.3, took place over two days in which a total of 73 test persons were scheduled at different time slots.

Of these 73 persons, 37 were assigned to do the tests that are included in this thesis.

3.2 Google Form

To gather more data and ensure that all test persons had a basic knowledge of programming, the test persons were obliged to fill out a form with questions before the test. Most of the questions dealt with the programming experience of the test person, with the intention of attempting to explain any anomalous test results. All test persons were given an anonymous number, so that each

8

(17)

CHAPTER 3. METHODS 9

form could be connected to a test result without revealing the identity of the test person.

3.3 Outline of tests

3.3.1 Test code

To be able to analyze the research question, a comparison between long and short functions and how they are comprehended with respect to function names, needed to be done. Two different versions of functions that test persons were to attempt to understand were created. The functions will be referred to as

"Function 1" and "Function 2". Function 1 and Function 2 had similar functionality yet performed different operations. Both were first written as one long function, stating all the operations after one another. After this, they were copied and split up into smaller functions, where each function aimed to do one thing as Martin and Fowler (et al.) claim they should, as mentioned in section 1.1. This resulted in having four different code snippets. One short (where the function was split up) and one long version of the first function, as well as one short and one long version of the second function. The purpose of having two different versions was so that both the long function version and the shorter functions version could be tested on the same person, see figure 3.1, and thereby be able to collect as much data as possible from the available test persons. To measure the impact of having intention-revealing function names, one of the split versions were written with intention-revealing names while the other version was written with non-descriptive names. In this thesis, the parameter names will be included to what is referred to when mention- ing function names. If the function name is intention-revealing, the same will apply to the parameter name and the other way around.

The code was then displayed as images in an interactive experience in Unity. As the split versions took up more space than the height of the screen, which was not compatible with the eye tracking device, the caller function was placed to the left and the called functions to the right, as can be seen in figure A.3 and figure A.4 in appendix A.

3.3.2 Execution of tests

The tests were built as two different interactive 2D experiences in Unity which the test persons could click their way through. The different experiences consisted of two functions each as well as instructions for each function. The

(18)

10 CHAPTER 3. METHODS

instructions described what the test person should do and mentioned the un- usual placing of the caller function, described in section 3.3. As a complement to the written instructions, verbal ones were given before each test. The written instructions for the split versions can be viewed in appendix B. The test persons were given one of two experiences and were thereby given two functions according to figure 3.1. They were then asked to give the correct answer to what the output of said functions were, following the recommended way to measure program comprehension mentioned in section 2.1.3 by Shneiderman (1976). As recommended in the same section, the time the user took to find the correct answer was also noted to help measure the comprehension. Since the tests had to take place within a specific time slot, the test persons were given a limit of 6 minutes to find the correct answer. If the test person had not been able to give the correct answer by then, they were asked to move on to the next function.

Figure 3.1: Distribution of test persons for tests

3.3.3 Use of eye tracking

During the attempts, an eye tracker was used to keep track of the test person’s eyes. Eye-tracking is a modern method for measuring code comprehension that does not interfere with the test persons thought process, as mentioned in section 2.1.3. The goal with this was to see if there could be a correlation between what the test persons looked most at in the code and the time passed for the test person to find the correct answer. The data from the eye tracker was extracted with Unity, using code written by Tysell Sundkvist & Persson (2017), to create heat maps of where the test persons looked on the functions.

(19)

Chapter 4 Results

This chapter will present the results from the tests described in chapter 3. This will include the amount of time that test persons required to find the correct answer as well as heat maps, mentioned in section 3.3.3, generated upon the eye tracking.

4.1 Function 1

This section will present the results of the tests of the long and the split versions of the first function. The split version of function 1 consisted of intention- revealing names.

4.1.1 Summarized data for function 1

The table below presents summarized data from the tests are presented for function 1. This includes both the percentage of test persons that were able to answer what the function returned within 6 minutes, as well as their average time to find the answer.

Long version split version

Correct answers (%) 0 63,1

Average time (s) - 161

Table 4.1: Summarized data for Function 1

Out of 18 test persons, none of the test persons were able to give the correct answer within 6 minutes of looking at the long version. The average time for finding the correct answer could thereby not be calculated. However, in the

11

(20)

12 CHAPTER 4. RESULTS

split version, 12 out of 19 test persons were able to give the correct answer within 6 minutes. This corresponds to 63,1 percent. The average time for the test persons that found the correct answer was calculated to 116 seconds.

4.1.2 Heat maps for function 1

In this section, the heat maps of the eye tracking for function 1 will be presented. The heat maps have been generated separately, based upon if the test persons were able to find the correct answer within 6 minutes or not. As no test person found the correct answer within 6 minutes for the long function version, no heat map could be generated for that version of this function.

Figure 4.1: Function 1, long version. Heat map of test persons who did not find the answer within 6 minutes.

(21)

CHAPTER 4. RESULTS 13

Figure 4.2: Function 1, split version. Heat map of test persons who found the correct answer within 6 minutes.

Figure 4.3: Function 1, split version. Heat map of test persons who did not find the answer within 6 minutes.

4.2 Function 2

This section will present the results of the tests of the long and the split versions of the second function. The split version of function 2 consisted of non-

(22)

descriptive function names.

4.2.1 Summarized data for function 2

The table below presents the summarized data for function 2. This includes the percentage of test persons that were able to answer what the function returned within 6 minutes, as well as their average time to find the answer.

Long version split version

Correct answers (%) 42,1 66,6

Average time (s) 213 233

Table 4.2: Summarized data for Function 2

Out of 19 test persons looking at the long version, 8 persons (42,1 percent) were able to find the correct answer within 6 minutes with an average time of 213 seconds. For the split version, 12 out of 18 persons (66,6 percent) were able to find the answer with an average time of 233 seconds.

4.2.2 Heat maps for function 2

Figure 4.4: Heat map of test persons who found the correct answer within 6 minutes. Function 2, long version.

(23)

Figure 4.5: Function 2, long version. Heat map of test persons who did not find the answer within 6 minutes.

Figure 4.6: Function 2, split version. Heat map of test persons who found the correct answer within 6 minutes.

(24)

Figure 4.7: Function 2, split version. Heat map of test persons who did not find the answer within 6 minutes.

4.3 Correlation between programming expe-

rience and time

Due to the data that was gathered through the form that each test person filled in before the tests, as mentioned in section 3.2, some correlation between the self- estimated program experience in years and the time to find the correct answer could be viewed in the different functions and versions. Some simplifications had to be done to the data. Firstly, test persons who had filled in "more than 5 years of programming experience" will be counted as 6 years. Secondly, the time for test persons who did not succeed to find the answer within 6 minutes was counted as 360 seconds, represented as red dots in the charts below. To attempt to determine if the data sets exhibits a positive, negative or no trend at all, trendlines were added to the charts.

The R²value is a statistical measure of how close the data points are to the fitted trendline. A rough rule of thumb is that 0.25, 0.50, and 0.75 respectively represent a weak, moderate, and substantial correlation. However, it depends on the field what is considered a "good" R² value. In behavioral studies, R² values of 0.20 are considered high. (F. Hair et al. 2013).

As no test person managed to find the answer for the long version of function 1, the only correlation that could be observed was that regardless of programming experience test persons were unable to find the answer within 6 minutes. Scatter charts for the split version of function 1, as well as both the

(25)

short and long versions of function 2 are presented below.

Figure 4.8: Function 2, long version

For the long version of function 2, seen in figure 4.8, a slight correlation could be seen between programming experience of the test persons and time to find the answer. As seen by the trendline in figure 4.8, more experienced programmers had a slightly higher chance of finding the answer more quickly. How- ever, the R²value of 0.003 indicate that that the correlation is weak, if existing at all.

(26)

Figure 4.9: Function 1, split version (intention-revealing function names)

The split version of function 1, seen in figure 4.9, has a similar trendline and R² value to the long version of function 2, indicating a weak correlation or no correlation at all. It can also be seen that the quickest answers were given by two programmers with varying experience, one with five years and one with one year of experience.

(27)

Figure 4.10: Function 2, split version (non-descriptive names)

For the split version of function 2, seen in figure 4.10, a more distinct correlation between programming experience in years and time to find the answer can be noticed. The tests persons with longer programming experience found the answer within a shorter time when the functions that were split up, had non- descriptive names. This is also supported by the R² value of 0.404, which, as mentioned in section 4.3, can be considered high in behavioral studies and therefore indicate a strong correlation.

(28)

Chapter 5 Discussion

The first observation that can be made by looking at the results, is that function 1 can be claimed to be more complex to comprehend than function 2. This as the percentage of test persons who could give the correct answer within 6 minutes for the long versions were 0 percent for function 1 and 42,1 percent for function 2. As the operations were simply written one after the other in the long versions, it is indicated that the code generally seemed to be less comprehensible. The percentage for the test persons who found the answer were about the same for the split versions, with 63,1 percent for function 1 and 66,6 percent for function 2.

5.1 Intention-revealing function names

Despite that function 1 supposedly had more complex code than function 2, the split version of function 1 had a shorter average time for test persons to find the answer than function 2 with a difference of 117 seconds. This indicates that intention-revealing function names have a significant role for code comprehension, as using them increased the percentage of correct answers from 0 percent to 63,1 percent for function 1. It is also supported by that a more complex function with intention-revealing function names could be comprehended faster than a less complex function with non-descriptive names. Looking at the heat maps for the split version of function 1, it can be seen that test persons who succeeded to give the correct answer looked more at the function names as well as not as much on the content of the functions, compared to the test persons who failed to give the correct answer. This supports the previously mentioned indication that the intention-revealing function names had a significant role for better code comprehension, as it can be claimed that test persons

20

(29)

CHAPTER 5. DISCUSSION 21

who succeeded, drew information from the function names and were thereby able to give the correct answer within a shorter time. This is also supported by the results presented in section 4.3, that indicate that the time for test persons to find the answer in the version with intention-revealing function names did not vary with programming experience. Meanwhile, for the version with non- descriptive names, the programming experience of the test persons seemed to have a significant impact on how quickly the answer was found. It can therefore be claimed that the test persons, when possible, rely on intention-revealing function names allowing them to not spend as much time understanding the code. This can also be related to the study carried out by Lawrie et al. (2006), mentioned in section 2.2.1, that found that full-length names and abbreviations for methods increased how confident test subjects were in their understanding of code.

As the parameter names were also adapted to be intention-revealing in the split version of function 1, the results can also be related to the study examin- ing the effect of variable names on code comprehension, mentioned in section 2.2.1, where it was shown that parameter names were significant for code comprehension. However, in the heat maps for the split version of function 1, with intention-revealing names, it can be seen that only a few test persons looked at the descriptive parameter names. This suggests that the parameter names did not have an extensive affect on the results of the tests in this thesis. This could be due to that the intention of the parameter names in the functions could be guessed. A theory could be that the importance of intention-revealing parameter names varies between different functions.

5.2 Short and long functions

Another observation that can be made is that even without intention-revealing functions names, there is a difference in results between the long and split versions of function 2. As the percentage of tests persons who found the correct answer is 42,1 percent for the long version and 66,6 percent for the split version, it can be claimed that the split version was easier to comprehend.

However, the average time for finding the answer was 213 seconds for the long version and 233 seconds for the split version. In other words, a larger share of test persons was able to tell the correct answer for the split version of the function, though they required more time to do so. Looking at the heat maps for the split version of function 2, no definite difference could be seen when compar- ing tests persons who succeeded and test persons who failed to give the correct answer. Therefore, no explanation can be drawn from the these. A hypothesis

(30)

22 CHAPTER 5. DISCUSSION

could be that the code was perceived to be more structured, allowing them stay focused by reading one operation at the time. This structure could be claimed to be more intuitively in the split version, where the function is split up into smaller functions. This hypothesis is supported by the study carried out by Xia et al. (2018), mentioned in section 2.2.2, were developers that took part in the study stated in interviews that large methods often were harder to understand.

Having an extensive amount of functionality in one function, referred to as Brain Method or God Method in section 2.2.2 by Lanza & Marinescu (2006) and Xia et al. (2018), could by extension also worsen code comprehension as it prohibits reuse of code. This can result in that the developer to have to read the same code multiple times, which can cause both confusion and require time.

The time difference of the long and split version could be referred to the study mentioned in section 2.2.1, in which it was found that meaningless naming of methods was found to slow down program comprehension. As shown in the heat maps for the split version of function 2, test persons spent time looking at the function calls, which are non-descriptive and therefore meaningless. Con- sequently, this could explain why tests persons took longer to find the answer for the split version with non-descriptive and thereby meaningless function names.

5.3 Sources of error

There are a number of factors that could have affected the test persons and thereby disrupted the test results. For example, the test persons were not in their normal programming environment while reading the code. Some test persons expressed that they felt nervous as well as disappointed when they did not find the correct answer within 6 minutes, which could effect their ability to understand the next function. After doing the test of the split version of function 1 (with intention-revealing names), some test persons expressed that they were suspicions about the functions names and expected the test to try to delude them. It is therefore possible that they would have found the answer faster, if they were in their normal programming environment. Another factor is that the sound level in the room was quite high, caused by the door opening and closing as well as people talking, which could have been distracting for the test persons. However, as these distractions were more or less constant, no extensive effect on the results were expected.

The outline of the tests could have been a disrupting factor by affecting how the the test persons comprehended the code. An example is that the main function in the split versions was placed to the left on the screen, when it is usu-

(31)

CHAPTER 5. DISCUSSION 23

ally placed above the called functions. To prevent this from causing confusion, the instructions shown before the code explained this. Another factor could be that the function names cause confusion by mistake, as in the study mentioned in section 2.2.1. To prevent this, the guidelines given by Robert Martin in the book Clean Code, mentioned in section 1.1, were followed when naming the functions. However, as mentioned in section 2.2.1, function names can mean various things to different programmers. Some test persons might happen to perceive the function names the same way as the creators of the code, in which case they have an advantage in comprehending the code compared to the test persons who does not.

Another source of error is the number of test persons. As the resources for the tests were rather limited, it is possible that an insufficient amount of test persons were gathered and that they lacked the diversity needed for the results to be convincing enough to draw definite conclusions.

Finally, the source of error that might be of most importance was caused by the time limit set for the tests. If a test person did not find the answer within 6 minutes, their result was counted as "failed". However, if they would have been allowed an infinite amount of time, they most likely would have found the answer at some point. As an unlimited amount of time for each test person to perform the tests was not possible in this thesis, due to limited resources and sharing test persons with the other groups mentioned in section 3.1, this becomes an important source of error which limits the conclusions that can be drawn from the results.

(32)

Chapter 6 Conclusions

As the tests might not have been performed on a large enough quantity of test persons and as there was a limited amount of time for each test person to perform the tests, which can lead to a misleading result, conclusions to answer the research question cannot be drawn with certainty. However, the results of the tests, discussed in chapter 5, indicates that intention-revealing function names seem to have a significant effect on both if and how fast developers can comprehend code. The results also suggest that simply splitting up functions that perform multiple different operations into smaller functions, using non- descriptive names, aided the programmer in comprehending the code. How- ever, this also caused the developers to spend more time in finding the answer, most likely due to that time was wasted looking at meaningless names.

The results of this thesis indicate that the answer to the research question is that the superiority of small functions, with regards to code comprehension, do not exclusively depend on the enabling of using intention-revealing names.

However, to be able to draw more reliable conclusions, further studies would be necessary, where the two most important improvements would be to have an unlimited amount of time for each test as well as a larger quantity of test persons.

24

(33)

Bibliography

Avidan, E. & Feitelson, D. G. (2017), ‘Effects of variable names on compre- hension: An empirical study’, ICPC ’17 Proceedings of the 25th Interna- tional Conference on Program Comprehensionpp. 50–65.

Bednarik, R. & Tukiainen, M. (2006), An eye-tracking methodology for char- acterizing program comprehension processes, in ‘Proceedings of the 2006 Symposium on Eye Tracking Research &Amp; Applications’, ETRA ’06, ACM, New York, NY, USA, pp. 125–132.

URL: http://doi.acm.org/10.1145/1117309.1117356

Brett, A. (2013), ‘Function vs method vs procedure’, [On- line] Available on: https://adamcod.es/2013/09/27/

function-method-procedure.html. [Accessed on 2019-03- 22].

Davis, D., Burry, J. & Burry, M. (2011), ‘Understanding visual scripts: Im- proving collaboration through modular programming’, The International Journal of Architectural Computing (IJAC) 9(4), 361–375. [Accessed on 2019-03-22].

F. Hair, J., Hult, T., Ringle, C. & Sarstedt, M. (2013), A Primer on Partial Least Squares Structural Equation Modeling.

Fowler, M., Beck, K., Brant, J., Opdyke, W. & Roberts, D. (1999), Refactor- ing: Improving the Design of Existing Code, Addison-Wesley Professional.

URL: http://www.amazon.ca/exec/obidos/redirect?tag=citeulike04- 20&path=ASIN/0201485672

Göde, D. N. (2014), ‘The real benefits of short methods’, [On- line] Available on: https://www.cqse.eu/en/blog/

the-real-benefits-of-short-methods. [Accessed on 2019-03-22].

25

(34)

26 BIBLIOGRAPHY

Lanza, M. & Marinescu, R. (2006), Object-Oriented Metrics in Practice, Springer, Berlin, Heidelberg.

URL: https://link-springer-com.focus.lib.kth.se/book/10.1007%2F3-540- 39538-5authorsandaffiliationsbook

Lawrie, D., Morrell, C., Feild, H. & Binkley, D. (2006), What’s in a name?

a study of identifiers, in ‘14th IEEE International Conference on Program Comprehension (ICPC’06)’, pp. 3–12.

Martin, R. C. (2008), Clean Code, Prentice Hall.

Nedhal, A.-S. A. (2017), ‘Source code comprehension analysis in software maintenance’, [Online] Available on: https://ieeexplore.ieee.

org/document/8075175. [Accessed on 2019-03-22].

Shneiderman, B. (1976), ‘Exploratory experiments in programmer behavior’, International Journal of Computer & Information Sciences 5(2), 123–143.

URL: https://doi.org/10.1007/BF00975629

Siegmund, J., Kästner, C., Apel, S., Parnin, C., Bethmann, A., Leich, T., Saake, G. & Brechmann, A. (2014), Understanding understanding source code with functional magnetic resonance imaging, in ‘Proceedings of the 36th International Conference on Software Engineering’, ICSE 2014, ACM, New York, NY, USA, pp. 378–389.

URL: http://doi.acm.org/10.1145/2568225.2568252

Spolsky, J. (2000), ‘Things you should never do, part i’, [Online] Avail- able on: https://www.joelonsoftware.com/2000/04/06/

things-you-should-never-do-part-i/. [Accessed on 2019- 04-08].

Storey, M. . (2005), Theories, methods and tools in program comprehension:

past, present and future, in ‘13th International Workshop on Program Com- prehension (IWPC’05)’, pp. 181–191.

Tobii (2019a), ‘Our business organization’, [Online] Avail-

able on: https://www.tobii.com/group/about/

#Ourbusinessorganization. [Accessed on 2019-03-22].

Tobii (2019b), ‘What is eye tracking?’, [Online] Available

on: https://www.tobii.com/tech/technology/

what-is-eye-tracking/. [Accessed on 2019-03-22].

(35)

BIBLIOGRAPHY 27

Tysell Sundkvist, L. & Persson, E. (2017), ‘Code styling and its effects on code readability and interpretation’, [Online] Available

on: http://kth.diva-portal.org/smash/get/diva2:

1112978/FULLTEXT01.pdf. [Accessed on 2019-03-25].

Unity (2019), ‘Game engines—how do they work?’, [Online] Available on:

https://unity3d.com/what-is-a-game-engine. [Accessed on 2019-03-25].

Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A. E. & Li, S. (2018), ‘Measur- ing program comprehension: A large-scale field study with professionals’, IEEE Transactions on Software Engineering 44(10), 951–976. [Accessed on 2019-04-08].

Zelkowitz, M. V., Shaw, Alan C., . & Gannon, John D., . (1980), Principles of software engineering and design, Englewood Cliffs, N.J. : Prentice-Hall.

Includes index.

(36)

Appendix A

Code of functions

28

(37)

APPENDIX A. CODE OF FUNCTIONS 29

Figure A.1: Function 1, long version

(38)

30 APPENDIX A. CODE OF FUNCTIONS

Figure A.2: Function 2, long version

(39)

APPENDIX A. CODE OF FUNCTIONS 31

Figure A.3: Function 1, split version with intention-revealing method names

Figure A.4: Function 2, split version with non-descriptive method names

(40)

Appendix B

Instructions

Figure B.1 contains the first and overall instructions for the test that the test person read. The same information was also given verbally. Figure B.2 contains the instructions for the first function in the test. Depending on which test the test person was assigned, this was either the long version of function 1 or function 2. Figure B.3 contains the instructions for the second function in the test. Depending on which test the test person was assigned, this was either the split version of function 1 or function 2.

32

(41)

APPENDIX B. INSTRUCTIONS 33

Figure B.1: Instructions for full test

Figure B.2: Instructions for first function

(42)

34 APPENDIX B. INSTRUCTIONS

Figure B.3: Instructions for second function

(43)

(44)

www.kth.se

TRITA-EECS-EX-2019:338