Resolving Higher-Order Conflicts in Edit History Refactoring

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2016,

Resolving Higher-Order Conflicts in Edit History Refactoring

OSKAR BODEMYR

(2)

Resolving Higher-Order Conflicts in Edit History Refactoring

OSKAR BODEMYR

Master’s Thesis at NADA Supervisor: Joel Brynielsson

Examiner: Jens Lagergren

(3)

(4)

Abstract

When committing source code changes to a version control system, these changes might affect more than one of several tasks connected to the project. This can have a bad impact on analysis of changes. It can also make it difficult to reuse or undo previous changes, or difficult to understand the evolution of software.

With edit history refactoring, the edit history of source code can be reconfigured to help the developer separate commits into smaller commits. Thereby, one can avoid commits that affect more than one task. However, separated commits can have unwanted effects on the source code. The aim of this thesis project is to resolve conflicts that may occur when refactoring an edit history.

Changes have been made to Historef, a tool created by Saeki Laboratory at Tokyo Institute of Technology. This tool uses the technique of edit history refactoring. The changes help the tool evaluate possible edit history refactorings in order to suggest one that avoids source code conflicts. By testing examples of edit histories, one can show that the tool can avoid different types of conflicts and the separated commits can be commited to a version control system without unwanted effects.

(5)

Referat

Lösning av högnivåkonflikter vid omstrukturering av kodhistorik

När kodändringar laddas upp till ett versionshanteringssy- stem är det möjligt att dessa ändringar är kopplade till mer än en uppgift i projektet. Det här kan ha en dålig påver- kan när man analyserar projektets ändringar, återanvänder tidigare ändringar, eller försöker förstå hur mjukvaran har ändrats över tiden.

Med hjälp av edit history refactoring kan kodändrings- historik omstruktureras för att separera kodändringar till mindre ändringar. Då kan man undvika att en ändring på- verkar mer än en uppgift. Däremot kan uppdelningen av ändringar resultera i oönskade effekter på källkoden. Syftet med det här examensarbetet är att undvika konflikter som kan uppstå när man strukturerar om kodändringshistorik.

Historef är ett verktyg skapat av Saeki Laboratory vid Tokyo Institute of Technology. Det här verktyget använder sig av tekniken edit history refactoring. Genom att utöka verktygets funktionalitet kan det numera evaluera omstruk- tureringar av kodändringshistorik för att kunna föreslå en omstrukturering som undviker konflikter. Genom att testa exempel på kodhistorik kan man visa att verktyget nu kan undvika olika typer av konflikter så att dessa ändringar kan laddas upp utan oönskade effekter.

(6)

Acknowledgements

I would like to thank Professor Motoshi Saeki and Assistant Professor Shinpei Hayashi at Saeki Laboratory at Tokyo In- stitute of Technology for letting me take courses and con- duct my thesis research while being a member of their laboratory. Thanks to the students of Saeki Laboratory for helping me set up the environment needed for Historef and for making me feel like a part of the laboratory. Thanks to the Scandinavia-Japan Sasakawa Foundation for helping me fund my studies and research in Japan. Thanks to Joel Brynielsson at KTH for taking the time to advise me and continuously give feedback on the report.

(7)

List of Figures

1.1 Evaluating different orders of changes. . . 5 1.2 A simple example demonstrating a conflict that might occur when sep-

arating commits. . . 5 2.1 Primitive refactorings defined in edit history refactoring. . . 10 2.2 Large refactorings defined in edit history refactoring. . . 10 3.1 A tree representation of the hierarchy of conflict-related relationships. . 16 4.1 Simple example demonstrated using conditional transformations. . . 20 4.2 Using a composed conditional transformation instead of multiple condi-

tional transformations. . . 21 5.1 Inequality Test Conflict: Evaluating the orders before (top) and after

(bottom) implementation. . . 26 5.2 Ordering Test Conflict: Evaluating the orders before (top) and after

(bottom) implementation. . . 29 5.3 Build Conflict: Evaluating the orders before (top) and after (bottom)

implementation. . . 30

(11)

(12)

Chapter 1

Introduction

The following chapter introduces some basic concepts needed to understand the degree project. The first section explains the term tangled changes, a side effect that might occur when submitting code to a repository. The second section describes the rule of Task Level Commit and what advantages it has. The third subsection introduces a concept used to follow this rule, called edit history refactoring. The fourth and final section explains the purpose of the degree project.

1.1 Tangled Changes

A tangled change occurs quite often in version control systems, but what does it mean? When making changes to source code, a developer often makes several changes, some of which may not be related. When committing these changes, it often happens that they are grouped together into a single commit, creating a tangled change.

An unwanted effect of these tangled changes can occur during analysis of the code.

If a single commit includes several tasks, an analytic software might interpret this as the tasks being related, even though there is no guarantee that they are. In a manual evaluation of five open-source Java projects made by Herzig and Zeller [1], it was discovered that up to 15% of all fixes consist of multiple unrelated changes.

This would create much noise in any analysing software. Imagine that a bug is being fixed for one task, but the commit also contains other changes for other tasks.

All of these might now be marked as having had a defect, even though only one of the tasks did.

1.2 Task Level Commit

By applying the rule of Task Level Commit [2], many version control issues are managed. The rule prohibits mixed change sets by striving to only include one intentional change or task in a committed change set.

(13)

CHAPTER 1. INTRODUCTION

Reuse of code is improved since the process of extracting a separate task is elim- inated. Since extraction is no longer needed, it should be possible to apply this change elsewhere without further preparatory work, e.g., in another branch of the source code. By the same reasoning, going back to an earlier version of the code is simplified. Understanding the code should also be improved. Imagine having a mixed change set, a commit including more than one change, with one commit message. Whoever reads this must try to identify which changes in the code apply to which tasks, but by enforcing a Task Level Commit policy, this no longer becomes a problem.

In reality, it is not that easy for a developer to keep change sets this well-managed.

Developers often work with several tasks at once, and might notice flaws in code that are irrelevant to the current task, and might not even be relevant to any of the assigned tasks. Keeping up with this manually would have a severe impact on the developers’ workflow. Thus, the cost reductions mentioned earlier might not make up for the cost increase of the worsened workflow. So in conclusion, in order to follow this policy, we must use other means.

1.3 History Refactoring

A solution to this problem is history refactoring. By recording the edit history of source code as atomic chunks, it is possible to refactor this history in order to get the desired results. Imagine that we can restructure the recorded edit history of a program. If we separate atomic changes by tasks, we end up with partial changes that can be committed individually to a software configuration management repository.

1.4 Purpose

The following section will first show a motivating example in order to clarify the purpose of this project. Secondly, it will explain what kind of conflicts this project aims to resolve.

1.4.1 Motivating Example

Imagine that a project enforces a commit policy saying that every state of the code should be clean according to some unit tests, i.e., all unit tests should pass for each commit. Say we have three changes, c1, c2 and c3, that are applied in that order.

A history refactoring tool that is unaware of this policy might suggest the order c₃c₂c₁ as a possible candidate for how these changes could be committed. But since the policy is not taken into account, there is no guarantee that this order supports the policy. If any state of the code after applying c3, c3c₂ or c3c₂c₁ does not pass

(14)

1.4. PURPOSE

the unit tests, this restructuring of the history should never have been suggested in the first place. There could however be another ordering where the tests pass in every state. Figure 1.1 visualizes this problem. If the tool were to evaluate the different states using the unit tests when suggesting an ordering, this problem would be resolved.

Figure 1.1. Evaluating different orders of changes.

Simple Example

Let us use a simple example to demonstrate this. Imagine there are two functions in a program, each of which return positive integers: integers x and y. In order for the test suite to pass, we have the requirement y > x.

These integers needs to be changed, so a developer changes the value of x in change c₁ and the value of y in change c2 in that order.

Figure 1.2. A simple example demonstrating a conflict that might occur when separating commits.

As seen in Figure 1.2, c1 changes x to a value that is larger than the original value of y. c2 then changes y to a value larger than the updated x. If these changes were to be submitted in multiple steps, one possible order would be c1 followed by

(15)

CHAPTER 1. INTRODUCTION c₁c₂. If the test suite is ignored by the history refactoring tool, this order could quite possibly be suggested. But as we can see clearly, c1 alone should not be committed since the state resulting from applying c1 does not satisfy the requirement y > x.

By using the requirement when testing the order, the only possible order should be c₂ and then c2c₁ because both of the versions after these commits will fulfill the requirement individually.

1.4.2 Higher-Order Conflicts

The paper Proactive Detection of Collaboration Conflicts by Brun et al. [3] describes three kinds of conflicts that can occur in version control systems. Textual conflicts occur when changes conflict with one another on a textual level, e.g., occur on the same line. Build conflicts occur when the result of applied changes prevents the project to build. Test conflicts occur when the result of applied changes allows the project to build, but fails the test suite connected to the project. These latter two conflicts are referred to as higher-order conflicts. The purpose of this project is to resolve these higher-order conflicts in history refactoring. These conflicts will be explained in further detail in Sections 3.1 and 4.1.1.

(16)

Chapter 2

Background

The following chapter will present previous works in order to explain the current state of edit history refactoring. The first section will explain the recording of edit operations, followed by a more in depth explanation of edit history refactoring. In this section, the tool Historef will be presented. The third and final section will talk more about tangled changes and their impact.

2.1 Recording Edit Operations

An important aspect of software maintenance is to understand a program in order to extend and/or modify it. It is possible to see the current state of a program by looking at the current snapshot of its source code, but in order to understand its evolution, more information is needed [4]. A developer often examines several snapshots stored in repositories of versioning systems, such as Git [5] or Subversion [6], and can therefore extract the difference between two successive snapshots. However, this analysis is often troublesome [7] and information such as individual changes is not represented. The reason why it is troublesome is because changes between two snapshots may be in a tangled state. The conventional versioning systems do not untangle changes or store changes individually. Therefore, useful information on past changes is lost [8]. There are several propositions on how to identify changes between two programs [9].

Although some of the above mentioned propositions have high accuracy, the result cannot be guaranteed for every project. A problem with conventional versioning systems is that all changes inbetween two versions of a program are gathered into one transaction. Another flaw is that many analysis techniques require the before and after snapshots to be compilable in order to be effective. This makes it very hard to capture individual changes. In order to gather these individual changes, recording the history of editing operations is important. Ritsumeikan University in Japan has created an Eclipse [10] Plugin for recording editing operations [4].

This plugin, called OperationRecorder, records the editing operations as changes

(17)

CHAPTER 2. BACKGROUND and stores them in a database for easy access. A change is composed of either one editing operation, or a composite of editing operations. OperationRecorder was in- spired by SpyWare [8], that uses a very similar approach for a Squeak Environment.

The difference is however how the plugins gather the changes. While SpyWare is dependent on an abstract syntax tree (AST), OperationRecorder uses the undo his- tory managed by Eclipse. Therefore, since Operation Recorder does not depend on the existence of an AST it can be used for recording uncompilable code as well as code that compiles.

Editing operations that directly affect source code in the editor is recorded by Op- erationRecorder. Both manual and automatic editing is captured. Manual editing comprise of operations such as keystrokes and cut and paste of text, whilst automatic editing comprise of operations such as completions and refactorings. Opera- tions that do not affect the source code directly, such as cursor movement, selection and file opening, are ignored. It is also possible to gather operations at a lower level than Eclipse’s undo history, such as capturing individual keystrokes. However, this information is often too primitive to understand.

In order to be able to use these editing operations in an independent way, Op- erationRecorder keeps track of the offset of each change. Say we have two changes.

The first change is made and OperationRecorder saves the offset o. The second change is then made textually before the first one, so o is updated in order to still keep track of the first change.

2.2 Edit History Refactoring

By recording and managing the edit history of source code, it is possible to increase software development productivity [11]. For instance, by recording edit operations, developers can undo and redo their actions. Other approaches also exist, such as replaying the edit operations as an attempt to support program comprehension [12][13]. As mentioned in the previous chapter, by following rules such as Task Level Commit, it is possible to reduce costs in version control. This rule also helps the developer to keep edit histories well-managed. In a study conducted by Saeki Laboratory [14], they discovered that tangled changes could jeopardize reuse, revert and understanding of previous changes, as explained in Section 1.2.

The history of changes of source code often becomes entangled. There are several factors at play, for instance noticing some unrelated bug while working on a specific task, and also due to the general trial-and-error process of source code development. Because of this entanglement, the edit history is often difficult to understand. In their paper [11], Saeki Laboratory proposes a technique called edit history refactoring in order to refactor edit history to make it more understandable and usable during practical software development.

(18)

2.2. EDIT HISTORY REFACTORING

By restructuring an edit history, we can create a history refactoring. As with regu- lar code refactoring, the goal is to improve understanding and/or usability without changing the overall effect. In order to separate tasks, their technique includes the concept of grouping. With grouping, changes can be assigned to different groups which are committed individually. In order to follow the rule of task level commit, one group should contain one task.

2.2.1 Definitions

The most atomic part of an edit history is called a chunk. In tuple notation, a chunk is written as h := (t, f, o, r, a), where t is the time when the edit was per- formed, f the source code file, o the starting offset, r is the removed string from the file as it was before the removal and a is the added string to the file after the addition. One of the last two parameters of the tuple may be empty since edits are allowed to be pure removal or addition. An edit history consists of elements called changes, H := c1c2. . . cn. A change is a pair of a sequence of chunks and a group, c:= (h1h₂. . . hn, g). Groups are explained further in Section 2.3. Simple changes created only by keystrokes from the developer can be represented as a single chunk, while more complicated changes such as those created by built-in refactoring tools require several chunks.

While history refactorings add, remove or restructure the changes in an edit history, the goal is to preserve both the overall effects of all chunks included, and the added and removed strings in these chunks. To clarify, say we have the original source code S with its edit history H. By applying H to S, we attain S⁰. A history refactoring H⁰ should then also attain S⁰ when applied to S.

2.2.2 Refactoring Definitions

In order to guarantee that the goal is fulfilled, Saeki Laboratory has defined primitive refactoringsthat individually satisfy the conditions, and large refactorings that use the primitive refactorings in order to achieve a more substantial result.

Primitive Refactoring

The tool uses four primitive history refactoring operations. These include swap, merge, cancel and split. These are visually represented in Figure 2.1.

Swap changes the order of two changes. Merge combines multiple changes into one.

Cancel could be seen as a special case of merge. The case when a change is merged with its adjacent inverse, and therefore results in no change at all, i.e., an empty change. Split is the opposite of merge, where one change is split up into multiple

(19)

CHAPTER 2. BACKGROUND

Figure 2.1. Primitive refactorings defined in edit history refactoring.

smaller changes.

In order for these procedures to avoid creating errors in code, preconditions have been implemented. This means that in order to apply the refactoring, some condition needs to be fulfilled before we can perform the procedure. For instance, in the case of swap, the changes must be independent of each other (no overlap). In the case of merge, changes need to be adjacent and belong to the same group.

Large Refactorings

By using these primitive refactorings, it is possible to define larger refactorings.

In Refactoring Edit History of Source Code [11] two large refactorings are defined, reorder and reconfigure. These are visually represented in Figure 2.2.

Figure 2.2. Large refactorings defined in edit history refactoring.

Reorder reorders tangled changes, using the swap history refactoring, in order to categorize them according to tasks. These can then be merged in order to ob- tain changes that each represent individual tasks. Reconfigure uses split on a larger change in combination with reorder in order to separate changes from a large change that is tangled. As for the case of reorder, merge might be used to finalize the results.

(20)

2.3. HISTOREF

2.3 Historef

Saeki Laboratory has created a supporting tool, Historef [11], that uses the tech- nique of edit history refactoring, as explained in the previous section. The tool works as a plugin for Eclipse and uses OperationRecorder, mentioned in the previ- ous section, to capture changes. Historef uses change grouping in order to support the complex configuration of history refactoring. Changes can be assigned groups both manually and automatically. The current change will be assigned the active group, selected in the Groups view. The group can also be changed manually at a later stage, or automatically by selecting a grouping mode. These modes include:

• Time Base - Based on when the change is made. Changes made shortly after each other should therefore belong to the same group.

• Method Base - Changes in the same method belong to the same group.

• Class Base - Changes in the same Class belong to the same group.

• File Base - Changes in the same File belong to the same group.

• Comment Base - Changes referring to, e.g., the same bug report belong to the same group.

Saeki Laboratory describes two applications of the Historef tool; the first being to support the task level commit rule, and the second being selective undo. Usually an editor is limited to only undoing the most recent change, but since changes are recorded and listed, Historef makes it possible to select other changes to undo.

2.4 Tangled Changes and Their Impact

As previously mentioned, tangled changes (described in Section 1.1) can create much noise when analyzing changes applied to a repository. Such analysis can be used to predict related changes [15], predict future defects [16][17] and gaining in- sight about projects. According to K. Herzig and A. Zeller, in their paper The Impact of Tangled Code Changes, tangled changes do not cause trouble in development [1]. This is however disputed by Saeki Laboratory, since they consider the reuse of changes to be part of development.

In Herzig and Zeller’s research, they answer the questions How popular are tan- gled changes? Can we untangle tangled changes? and How do tangled changes impact bug count models?

Starting with the first question concerning how popular tangled changes are, by using five big open-source projects, they manually checked over 7,000 change sets and came to the conclusion that

(21)

CHAPTER 2. BACKGROUND

• up to 20% of all bug fixes contain multiple tangled changes,

• up to 12% of change sets related to issue reports are tangled,

• up to 16% of all change sets can be associated with bug reports addressing multiple concerns,

• 73% of all tangled changes have a blob size of two,

where blob size is the number of individual tasks in a single commit [1]. Another discovery they made was that they were unable to decide whether a change set was tangled or atomic in most cases.

To answer their second question, they developed a prototype of a heuristic untangling algorithm. Being heuristic, it cannot solve the untangling problem completely, but that is not what they are aiming for. What they aim to do is to verify whether untangling code changes is feasible and to see how accurate it is. The algorithm itself takes an arbitrary change set as input and returns a set of change set partitions.

Changes in one partition are more closely related to changes in the same partition, than to changes in other partitions. Preferably, different tasks would end up in different partitions. In order to measure how related two changes are, they base decisions on a feature vector using values from what they call confidence voters.

Each voter represents a different aspect, including FileDistance, PackageDistance, CallGraph, ChangeCouplings and DataDependency. The ChangeCouplings voter is based on the concept of change couplings presented by Zimmerman et al. [15]. This measure is based on analyzing previous changes and detecting frequently occurring patterns in change sets. For instance, if two files are often changed together, it is likely that these two files are related. In order to partition the changes, they create a triangular partition matrix with the initial size m × m where m is the number of change operations. The confidence value, expressing how likely it is for two changes to be related, for each pair of changes is calculated by the confidence voters. Af- ter determining the highest confidence value for a pair of changes, discarding the diagonal elements, the columns and rows are removed for these changes and a new row and column is created for this new partition. The confidence values are then calculated between this new partition and the other elements in the matrix. This process is repeated until it reaches a stopping criterion of a fixed number of partitions, or a specified threshold of the confidence values. When comparing confidence values, the authors decided to use the maximum of all values, since changes can be related without having many properties in common. To measure the result of their algorithm, they created artificially tangled change sets in four of the big open-source projects they had previously analyzed. They used blob sizes between two and four.

The results they gained were the following:

• They could untangle artificially tangled change sets with a mean precision between 0.58 (blob size four) and 0.79 (blob size two).

(22)

2.4. TANGLED CHANGES AND THEIR IMPACT

• They could untangle any two artificially tangled change sets with a precision of 0.67 and 0.93 (blob size two), depending on which open-source project was being untangled.

To answer their third and final question about the impact of tangled changes, they compared the analysis of the original repositories and modified ones, where they untangle the changes they previously classified manually. They compared these versions by analyzing how files were associated with bug reports and got the following results:

• Between 6% and 50% of the most defect prone files are falsely classified.

• At least 16.6% (on average) of all source files are incorrectly associated with bug reports.

(23)

(24)

Chapter 3

Theory

This chapter will present theory used in this degree project. The first section will present conflicts that can occur in version control systems and present the different states the code can be in. The relationships between these states will also be presented. The second section will briefly explain unit testing. The third and final section will explain conditional transformations, a representation that can be used for the application of changes.

3.1 Version Control Conflicts

As mentioned in Section 1.4.2, Brun et al. describes three kinds of conflicts that can occur in version control systems. The paper, Proactive Detection of Collabo- ration Conflicts[3], focuses on collaboration conflicts, meaning conflicts that occur when several developers are involved in the same project.

3.1.1 Proactive Detection of Collaboration Conflicts

In a collaborative project, developers work with their individual copies of the project files. The developers make changes to their own local copies, and from time to time share these changes with the other developers, or incorporate changes that other developers have made. This type of synchronization has both advantages and dis- advantages, one advantage being that this can allow for rapid development [3]. One disadvantage, however, is that conflicting changes can occur, since developers are allowed to make changes to the same file simultaneously. Other factors can also affect the likeliness of conflicts. Fear of conflicts itself is one example. Conflicts can be separated into two groups: textual and higher-order conflicts. Brun et al. identify seven relevant relationships between two repositories of source code [3]. These in- clude SAME, AHEAD, BEHIND, TEXTUAL7, BUILD7, TEST7, TEST3. SAME is when both repositories have the same change sets. AHEAD is when one repository is a superset of the other, i.e., includes all the change sets of the other repository

(25)

CHAPTER 3. THEORY and has additional changes. BEHIND is the opposite of AHEAD. The remaining relationships will be described in the following sections.

3.1.2 Textual and Higher-Order Conflicts

Textual conflicts occur when developers make inconsistent changes to the same part of the code. In order to prevent one of these changes to overwrite the other, version control systems allow the first developer to publish his changes, but the second developer cannot publish until the conflict is resolved. This can happen either automatically by means of the version control system (merging), or manually through actions undertaken by the developer. When the version control system cannot automatically merge two repositories without human intervention, these repositories have the relationship TEXTUAL7.

A build failure is one of the higher-order conflicts and it is represented by the relationship BUILD7. The meaning of this relationship is that two repositories can be automatically merged by the version control system, and therefore are not in TEXTUAL7. However, the result of this merge does not build.

Another higher-order conflict is test failure, represented by TEST7. Here, the merged code succeeds to build, but the code fails its test suite. When the merge succeeds to build and the test suite passes, we have the relationship TEST3. The relationships TEXTUAL3 and BUILD3 have obvious meanings, although they are not specific enough to be part of the original seven relationships.

TEXTUAL3

BUILD3 TEST3 TEST7

BUILD7

TEXTUAL7

Figure 3.1. A tree representation of the hierarchy of conflict-related relationships.

Looking at the tree demonstrating the hierarchy of the conflict-related relationships, shown in Figure 3.1, we see that neither TEXTUAL3 nor BUILD3 are leaves, meaning there might be an even more specific relationship between the repositories.

In fact, neither of these cases can tell us whether there is a conflict or not.

(26)

3.2. UNIT TESTING

3.1.3 Conflict Occurrences

In a study conducted by Zimmerman on four open-source systems [18], he presented that between 23% and 47% of all merges had textual conflicts that could not be resolved by the version control system (TEXTUAL7). The rest of the merges are in TEXTUAL3. However, this does not say anything about the occurrences of higher-order conflicts.

In Brun et al.’s study [3], they analyze nine open-source systems. Here, on av- erage one in six merges, about 17%, resulted in TEXTUAL7. The reason why this is lower than in Zimmerman’s study may be due to the use of superior merging algorithms. The remaining 83% are in TEXTUAL3, although this includes the higher-order conflicts. Out of these nine open-source systems, six of them could not be used to identify test conflicts due to the absence of a non-trivial test suite. The remaining three gave the following result:

• 76% of merges completed cleanly (excluding higher-order conflicts).

• 16% of merges resulted in a textual conflict, TEXTUAL7.

• 1% of merges resulted in a build failure, BUILD7.

• 6% of merges resulted in a test failure, TEST7.

3.2 Unit Testing

Unit testing is a technique where tests are specified for units of source code. A program can be decomposed into units, where a unit is a collection of functions [19].

By specifying inputs for certain units, the developer can test whether or not it gives the expected output. The test passes if the unit gives the expected result, and fails otherwise. Unit testing has become an important part of software development. A contributing factor has been the use of Test Driven Development (TDD) [20]. In this development process, test cases are created before the actual software is created, in order to guarantee the intended behaviour of the software.

In Java software development, the framework JUnit is often used for unit test- ing [21]. The fundamental concepts of JUnit include test case, test method and test suite to name a few. A test case is used to test specific usage of a Java class.

This is done with a collection of test methods that tests components of the class in question. By combining test cases, we can create a test suite to span over an entire project.

(27)

CHAPTER 3. THEORY

3.3 Conditional Transformation

When working with preconditions, one possible concept to use is conditional trans- formation, often shortened to CT. A conditional transformation is represented as a pair which consists of a precondition and a transformation. The transformation can only be performed if the precondition holds.

As mentioned by Kniesel and Koch in their paper titled Static composition of refac- torings[22], a refactoring can be seen as a special form of a conditional transformation, namely a behaviour preserving one.

In the same publication, they discuss the composition of conditional transformation.

A sequence of conditional transformations can be composed into a new single con- ditional transformation with equal functionality. The main benefit of this composed CT is that instead of having to evaluate every individual precondition separately in several steps, a new combined precondition can be evaluated once. If this combined precondition holds, the combined transformation can be performed.

(28)

Chapter 4

Methodology

This chapter will explain methods used in this project. The first section will present properties of edit history refactoring. The second section will present the effects of separating commits in version control systems. The third and final section will explain the implemented changes to Historef.

4.1 Edit History Refactoring Properties

The following section will describe properties related to edit history refactoring. The first subsection will discuss previously mentioned conflicts applied to edit history refactoring. The second subsection will discuss behaviour preservation. Finally, the third subsection will apply the concept of conditional transformation to edit history refactoring.

4.1.1 Conflicts in Edit History Refactoring

The conflicts presented in Section 3.1 do not only apply to development in collab- orative environments, but can also be applied to edit history refactoring using slight modifications. The way the conflicts occur is a bit different. Instead of comparing two repositories from different developers, we are working with an edit history that is divided into multiple commits. While the properties of the edit history refactoring technique guarantee that the end result of the edit history is the same, it cannot guarantee the behaviour of the individual commits before the final commit.

Textual conflicts in edit history refactoring differ slightly from the earlier definition.

Now, a textual conflict occurs if manipulating the edit history results in invalidat- ing changes on a textual level. For instance, if two changes are occurring on the same place in the code, swapping the order of these changes might not preserve their behaviour. Luckily, the preconditions for applying the refactorings explained in Section 2.2.2 already take care of these conflicts. Sadly, the higher-order con- flicts are neither addressed in edit history refactoring nor in Historef. In order to

(29)

CHAPTER 4. METHODOLOGY do this, we require testing in order to evaluate whether steps in the edit history are in BUILD3/BUILD7 or TEST3/TEST7.

4.1.2 Behaviour Preservation

In Section 3.3, we established that refactorings are behaviour-preserving transfor- mations. In the case of edit history refactoring, the effect of all applied changes generates the same result with the same behaviour as the original changes of the source code. However, since a history refactoring can contain several changes that may be put in different order, the behaviour of the different snapshots can vary. For instance, looking back at the simple example in Figure 1.2, the results of c1c₂ and c2c1 are the same, but the application of only c1 or only c2 generate unique results.

4.1.3 Conditional Transformations in History Refactoring

Kniesel and Koch [22] discuss the composition of preconditions, but for the case of edit history refactoring with testing, these preconditions are composed of the tests themselves. Despite the preconditions being the same in all cases, it is not as simple as combining them all into one precondition before applying the combined transformation. Let us once again use the simple example from Section 1.4.1 to demonstrate this.

Figure 4.1. Simple example demonstrated using conditional transformations.

As seen in Figure 4.1 the preconditions are the same, but let us look at what would happen if we were to create composed CTs with the same precondition. Transfor- mation T1 corresponds to applying change c1 and transformation T2 corresponds to applying change c2. Note that the condition C1 in the CT is not the same as c1. The two possible orders are shown in Figure 4.2. In (a) the order c1c2 is used, and in (b) the order c2c₁ is used. As we already know, the order used in (a) should not be valid, but since the unit tests are applied to the state before this combined transformation, i.e., the original state, the tests pass and therefore the transformation is allowed to be performed.

The reason why this happens is that the precondition does not take into account the states inbetween the transformation. We already know that the state before the

(30)

4.1. EDIT HISTORY REFACTORING PROPERTIES

Figure 4.2. Using a composed conditional transformation instead of multiple con- ditional transformations.

changes have been applied should be clean (test suite passes), and that the state after the application of all changes should be clean. The latter statement assumes that the developer knows and follows the conditions of the project. The important part is however that every state of the code should be clean, so therefore we need to take every step into account.

C:=

n−1

^

i=1

C₁(vi) T := vn

(4.1)

The preconditions and transformations in edit history refactoring can be described as shown in Equation 4.1. The precondition C can be described recursively. The transformation T is the application of all changes resulting in vn where n is the total number of changes. Partial changes are all changes between v1 and vn−1. v₀ is the original state of the code. The notation of C1(vi) is the precondition evaluation when the code is at state vi. Simply put: In order to evaluate the entire edit history, every state should pass the same condition. Because the evaluation is done recursively, this “composed” CT does not actually have the benefit of only having to evaluate the precondition once, and therefore it is not a true composed CT. What we can gain from this however is a more generalized way of expressing edit history refactoring using conditional transformations.

Post-condition

The CTs only take the preconditions into account, meaning that the state of the code is not checked when applying all changes. This can easily be checked, but in the case of edit history refactoring and commit policies, the importance lies in the snapshots created in between the initial state and the finished state of the code.

The developer should know about the policy, so therefore the state of the code after all changes should still fulfil the requirements.

(31)

CHAPTER 4. METHODOLOGY

4.2 Commit Separation Effects

When changes are made and committed to a software configuration management repository, we can observe two versions or snapshots of the code. v0 can be seen as the original state of the code, before any changes were applied, and vncan be seen as the final state of the code after application of n groups of changes. In order to follow the rule of Task Level Commit (explained in Section 1.1) [2], these groups should be committed separately to the repository, assuming that each group belongs to its own task.

By committing these groups separately, we will create versions in between the origi- nal two. These versions can be referred to as v1v2. . . vn−1where viis the application of the ith group on vi−1. The state of v0 and vnare always the same, regardless of the change group orderings. However, the state of v1 through vn−1 is dependent on the order of the groups. As an example, say we have three groups of changes, g1, g2

and g3. These groups are sets containing the changes c1c2. . . cN. Applying a group is represented as ^g^k.

Order: g1, g₂, g₃ v₁:= v0 ^g¹ v₂ := v0 ^g¹^g² Order: g1, g₃, g₂ v₁:= v0 ^g¹ v₂ := v0 ^g¹^g³ Order: g2, g1, g3 v1:= v0 ^g² v2 := v0 ^g²^g¹ Order: g2, g₃, g₁ v₁:= v0 ^g² v₂ := v0 ^g²^g³ Order: g3, g₁, g₂ v₁:= v0 ^g³ v₂ := v0 ^g³^g¹ Order: g3, g2, g1 v1:= v0 ^g³ v2 := v0 ^g³^g²

Table 4.1: Possible snapshots.

As seen in Table 4.1, there are three unique outcomes for v1 and three unique outcomes for v21, identified by different colors, creating a total of six combinations.

As shown in the rightmost column, the order of changes does not affect the state, meaning that v0 ^gⁱ^g^j represents the same state as v0 ^g^j^gⁱ. As previously stated, vnwill always be the same regardless of the orders, meaning that every state shown in the leftmost column are identical states. If we want to avoid higher-order conflicts, we need to find an order where every snapshot created avoids these conflicts.

Evaluating an order cannot be done just in one step. It is an iterative process, where in order to advance to vi, vi−1 must pass its evaluation. This creates the precondition on vi−1 for all vi.

1Since ^g^x^g^y=^g^y^g^x.

(32)

4.3. IMPLEMENTATION

4.3 Implementation

This section will explain the changes that were made to Historef in order to account for higher-order conflicts. The implementation can be separated into three main parts. Creating partial changes, evaluating these changes and generating orders to evaluate.

4.3.1 Partial Changes

In order to evaluate different orders, the code needs to be tested in its different stages. When the code is separated into groups and these are reordered, the original state of Historef did not apply or undo the changes.

In order to extend Historef to include higher-order conflict prevention, the different possibilities for snapshots need to be evaluated. In order to achieve these different snapshots, functionality was added for applying and undoing groups. Since Historef already had functionality for applying and undoing individual changes, to support the selective undo feature of the tool. Using this functionality, group manipulations were implemented where group apply means applying all the changes of a group.

The changes are applied in the same order as the history refactoring. The second group manipulation, group undo, reverts all the changes that belongs to a certain group. This is done in the reverse order of the changes in the history refactoring.

The -notation is used for the group apply manipulation. For instance, in vi ^g^j, the changes of group gj is applied to snapshot vi. Group undo is represented by the

-notation. Therefore, vi ^g^j^g^j would mean that we first apply the changes of group gj to vi and then undoing those same changes, meaning we get the result vi. This means we have the functionality necessary to try a certain order of changes, and then going back to a previous stage if evaluation fails, by undoing the latest applied changes. To prevent changes from being applied or undone twice, every group has its latest manipulation type saved. Therefore, a group can only be undone if it has previously been applied and a group can not be applied again until the previous application has been undone.

4.3.2 Snapshot Evaluation

With the addition of creating the different snapshots, the next step is to evaluate them. Projects often have test suites in use, whereupon conflicts could potentially be avoided at test-level, meaning we strive to achieve TEST3. However, if a test suite is not in use, there are no tests to evaluate, meaning we can only avoid conflicts at build-level (striving to achieve BUILD3).

Since test suites in an Eclipse environment are usually made using JUnit-tests when using Java, this was chosen to be used for test-level evaluation. If a JUnit test is run

(33)

CHAPTER 4. METHODOLOGY once in Eclipse, its run configuration is saved. By re-using this run configuration, it is possible to run these tests on command. When these tests are run, the results can be captured by a listener. These results are then saved to a collection and can be accessed by Historef at any time.

When there is no test suite in use, instead of using JUnit, Historef checks to see if there are any errors in the project. If there are none, this means evaluation passes.

Warnings are ignored since they do not render a project unbuildable.

4.3.3 Generating the Order

Historef previously had functionality to generate an ordering of groups. However, the state of the code did not change depending on the generated order. Since we cannot compose the tests into a single precondition, established in Section 4.1.3, the source code needs to reflect the generated order. Therefore, the first thing done before generating an order is undoing all groups. This means the code is reverted to its original state v0, before any editing was done. From here, possible permutations of the group order are generated, while the code is changed simultaneously to reflect the snapshots that the order creates.

To attain a suitable order, all groups must be included in the order while also fulfilling all the constraints put on the order. The constraints are created while generating the order. If a failed evaluation after applying group gi is directly fol- lowed by a passed evaluation after applying group gj, we add the constraint gj <

g_i, meaning group gj should be applied before applying group gi. This means that while generating a new order, even if it does not include all changes, we can discard it if it does not fulfil the constraints. If all possible permutations fail, the order will remain unchanged from the original, and the log will show a message stating that no suitable ordering can be found.

Once again, the simple example in Section 1.4.1 can be used for demonstration.

If the first change belongs to group g1 and the second to group g2, Historef will first start with the order g1,g2, if the groups have been created in that order. v0 ^g¹ will fail its evaluation, while v0 ^g¹^g² will pass its evaluation. Therefore, the constraint g2<g1 will be added as heuristic. The state of the code will be reverted by applying ^g²^g¹. The next order tested will be g2,g1. Evaluation passes here on both steps, and since the constraint is fulfilled, we have found an order that is suitable.

(34)

Chapter 5

Results

The following chapter will present the results from different tests. In order to test the implementation, a history recording is required. Since a long edit history is hard to keep up with, short edit history examples are used as a proof of concept. If a developer were to use this tool during development, it would not be unthinkable to use short edit histories since the edit histories investigated by the tool would be as large as one commit.

The examples will be presented in Java code. Firstly, v0, i.e., the original state of the code, will be presented. Secondly, vn, i.e., the final state after all changes have been applied will be presented. Changes are assigned groups and the order of these groups will then be modified and/or evaluated.

The output, as in the ordering of the groups, will be presented in two versions.

The first is the ordering as suggested by Historef before higher-order conflicts are considered. The second output will be from Historef with these implementations made. These are colored according to the evaluations made in each step by the newer version of Historef. Below each state of the code, v0 is used to show the original state and gi is used to show that group gi has been applied to the previous state.

(35)

CHAPTER 5. RESULTS

5.1 Example 1: Inequality Test Conflict

The first example is a short class containing two methods, getX() and getY(), each of which are of type int. The project uses a test suite, with the following requirement:

• The getX()-method and getY()-method must return different values.

The developer is aware of this fact.

v0

1 public class ExampleOne{

2

3 public int getX(){

4 return 10;

5 }

6

7 public int getY(){

8 return 15;

9 }

10 }

vn

1 public class ExampleOne{

2

4 return 15;

5 }

6

8 return 20;

9 }

10 } The changes are done in the following order:

• c1: The developer changes the return value of getX() from 10 to 15. This change belongs to group gA.

• c2: The developer changes the return value of getY() from 15 to 20. This change belongs to group gB.

The changes have been assigned to different groups since they are made in different methods. Since this example has a test suite, it is possible to evaluate this on test-level.

Output

Figure 5.1. Inequality Test Conflict: Evaluating the orders before (top) and after (bottom) implementation.

(36)

5.1. EXAMPLE 1: INEQUALITY TEST CONFLICT

The time-lines in Figure 5.1 show the different evaluations of the outputs from Example 1. The first output, suggested by old Historef, wanted to first apply the changes of group gAto v0. However, the state v0 ^g^A has TEST7 since both of the methods return the same value in this state. Committing the groups individually in the order suggested by old Historef would therefore result in a change set with a conflict. The next step in the order, where we arrive at state v0 ^g^A^g^B is valid, since it is vn. A good ordering, however, requires that all states are conflict free.

The group ordering suggested by new Historef suggests to first apply group gB

to v0. This is acceptable, since the state v0 ^g^B has different return values for getX()and getY(). The next state, v0 ^g^B^g^A is also valid, since it is the same result as the developer intended.

(37)

CHAPTER 5. RESULTS

5.2 Example 2: Ordering Test Conflict

The second example is a class similar to the first example, with the addition of yet another int-method, getZ(). This project also uses a test suite, with the following requirements:

• The getY()-method must return a value larger than the getX()-method.

• The getZ()-method must return a value larger than the getY()-method.

v₀

1 public class ExampleTwo{

2

4 return 25;

5 }

6

8 return 35;

9 }

10

11 public int getZ(){

12 return 45;

13 }

14 }

v_n

1 public class ExampleTwo{

2

4 return 40;

5 }

6

8 return 45;

9 }

10

11 public int getZ(){

12 return 50;

13 }

14 }

The changes are done in the following order:

• c1: The developer changes the return value of getX() from 25 to 40. This change belongs to group gA.

• c2: The developer changes the return value of getY() from 35 to 45. This change belongs to group gB.

• c3: The developer changes the return value of getZ() from 45 to 50. This change belongs to group gC.

Just like in the previous example, the changes have been assigned to different groups since they belong to different methods. Since there is a test suite, we can run new Historef in test-level.

(38)

5.2. EXAMPLE 2: ORDERING TEST CONFLICT Output

Figure 5.2. Ordering Test Conflict: Evaluating the orders before (top) and after (bottom) implementation.

Evaluation of the output steps for Example 2 are shown in Figure 5.2. Starting with the output from old Historef, we have two steps that fail the evaluation. Both of these are in TEST7. The reason for this is that in v0 ^g^A, getX() returns a larger value than getY(). In v0 ^g^A^g^B, this issue is resolved, but now getY() returns the same value as getZ(), which fails the test suite. Applying the last group of changes resolves the issue, but we want to avoid these previous failures in order to be able to commit the groups individually.

The output from new Historef results in an ordering that passes evaluation in every step. Simply put, by running the tests the algorithm chooses an ordering where the value that should be the largest is increased first, followed by the second largest, and lastly the lowest value.

(39)

CHAPTER 5. RESULTS

5.3 Example 3: Build Conflict

The third example differs from the first two. Here we start out with a simple String-method foo(). This time there is no test suite in use.

v₀

1 public class ExampleThree{

2

3 public String foo(){

4 return "foo";

5 }

6 }

7 8 9 10

v_n

1 public class ExampleThree{

2

3 public String foo(){

4 return bar();

5 }

6

7 Â§ public String bar(){

8 return "bar";

9 }

10 } The changes are done in the following order:

• c1: The developer replaces the return value of foo() from "foo" to a call to method bar(). This change belongs to group gA.

• c2: The developer then proceeds to write the String-method bar(), returning

"bar". This change belongs to group gB.

These changes have been assigned different groups, not only because they are in different methods, but also because the first change is the editing of an existing method, while the second change is an addition of a new method. Since there is no test suite connected to this project, the highest level we can run Historef on is build-level.

Output

Figure 5.3. Build Conflict: Evaluating the orders before (top) and after (bottom) implementation.

Evaluation of the third example is shown in Figure 5.3. Even though we did not have any test suite, we still found a failed step. In the ordering suggested by the

(40)

5.3. EXAMPLE 3: BUILD CONFLICT

old Historef, group gA is first applied, followed by group gB. The state v0 ^g^A fails to evaluate due to a BUILD7. If we imagine what this state would look like in the code, we would see that the new method call would be added to the old method, without the new method being added. This means that we would get an exception, in this case a MethodNotFoundException(), which in turn means that we cannot build the project. The ordering suggested by new Historef avoids this, by first applying the changes in group gB, followed by the changes in group gA.

(41)

(42)

Chapter 6

Discussion

As seen in all examples, the new Historef has provided an ordering of groups where all states of the code pass evaluation. The previous version of the software could not detect these conflicts since none of them are in TEXTUAL7. The new version of Historef can detect conflicts even when the state of the code is in TEXTUAL3, i.e., higher-order conflict. For instance, in the first example when v0 ^gÂ is evaluated, the TEST7 is detected. Therefore, this step is undone (v0 ^gÂ^gÂ), whereupon v₀ ^g^B is evaluated. When this state passes, the only group remaining is gA, so this group is applied afterwards.

Committing the code as suggested by old Historef could jeopardize both reuse of code and the ability to revert code. By reusing code, we might apply changes that result in a state where conflicts exist. By the same reasoning, we might revert changes to end up in a previous state where conflicts exist.

The examples are very simple, but all of these methods could be part of a larger project. The changes themselves are also small, but they are well suited to show that it is possible to avoid higher-order conflicts. For instance, the third example could well be used in a real project. Imagine the first method, foo(), being larger and the developer realizes that some of the functionality in this method could also be used in another method. Therefore, the developer decides to create and call method bar() where this functionality is put. The evaluation of old and new His- toref would be the same as in example 3.

Looking at the individual states of the code, the usefulness of the states between v0 and vn can vary. In the first two examples, all tests pass in all states, as they should. In the third example, the state we attain between v0 and vn avoids a po- tential build conflict, but the code itself would most likely result in a warning due to an unused method, bar(). In a real project, the usefulness of these individual commits would primarily be that they can follow the rule of Task Level Commit, previously discussed in Section 1.2.

(43)

CHAPTER 6. DISCUSSION

When looking at the changes made throughout these different examples, a common factor is that the first change made in each of them will produce some sort of conflict. However, it is important to have a developer’s coding habits in mind. Even if a developer is aware of the conditions set on a project, the developer will most likely not take the commit separation into account. Referring to the first example, the developer might know that these values should be different, but since both are changed it should not matter. In all examples, code has been changed from the top of the file to the bottom of the file, which might also be a habit. In all of the examples, conflicts could be avoided by changing the behaviour of the developer, but as mentioned in Section 1.2, this could have a severe impact on the developer’s workflow.

Resolving Higher-Order Conflicts in Edit History Refactoring

Resolving Higher-Order Conflicts in Edit History Refactoring

OSKAR BODEMYR

Resolving Higher-Order Conflicts in Edit History Refactoring

Abstract

Referat

Lösning av högnivåkonflikter vid omstrukturering av kodhistorik

Acknowledgements

Contents

List of Figures

Chapter 1

Introduction

1.1 Tangled Changes

1.2 Task Level Commit

1.3 History Refactoring

1.4 Purpose

Chapter 2

Background

2.1 Recording Edit Operations

2.2 Edit History Refactoring

2.3 Historef

2.4 Tangled Changes and Their Impact

Chapter 3

Theory

3.1 Version Control Conflicts

3.2 Unit Testing

3.3 Conditional Transformation

Chapter 4

Methodology

4.1 Edit History Refactoring Properties

4.2 Commit Separation Effects

4.3 Implementation

Chapter 5

Results

5.1 Example 1: Inequality Test Conflict

5.2 Example 2: Ordering Test Conflict

5.3 Example 3: Build Conflict

Chapter 6

Discussion