Continuous integration pipelines to assess programming assignments

(1)

Bachelor Degree Project

Continuous integration

pipelines to assess

programming assignments

- Test like a professional

Author: Anton Strand Supervisor: Johan Leitet

(2)

Abstract

Examiners of programming assignments in higher education and people in the software industry both need to test and review code. However, the assessing techniques used are often quite different. The IT industry often uses agile work methods like continuous integration and automated tests, while examiners either do manual assessments or rely on code grading tools. The students will most likely become developers and work using agile processes.

Therefore, there are possible benefits of universities trying to imitate the work processes of the software industry. The purpose of this study was to develop a workflow for programming assignments inspired by continuous integration, Scrum, and GitLab flow. The workflow was developed based on the requirements of Linnaeus University and tested on one of their programming assignments. It showed that a simplified agile work process is suitable for programming assignments since the demonstration fulfilled all of the predefined requirements. However, examiners might miss some of the workflow’s benefits if the programming assignment can not be tested automatically since it will require more manual work while grading.

Keywords: continuous integration, CI pipeline, agile work process, programming assignment, code assignment, automated testing, code review

(3)

Preface

I want to thank Johan Leitet and Mats Loock from Linnaeus University, who had a big part in the making of this project. They shared their knowledge from an examiner's point of view and helped define the requirements of a successful programming assignment workflow.

I'm also thankful that Johan Leitet took the time to be my thesis supervisor as well as being one of the project’s clients. In that regard, I want to thank Daniel Toll for allowing me to be a part of his seminars even though I technically was not part of his group.

Finally, I want to thank my partner Victoria for allowing me to vent and ask questions when needed.

(4)

1 Introduction

Computer science courses often have programming assignments, which might result in a large amount of code to review, test, and grade. Therefore, manually assessing code assignments is often a time-consuming task [1], which has resulted in many different implementations of automated code grading tools [2]. However, these tools are often opinionated and do not allow the teacher enough freedom to tweak them to fit their needs [2].

The problem of validating and verifying the source code is not limited to examiners of computer sciences courses. The IT industry faces similar issues, as well [3]. A popular industry solution is to set up a Continuous Integration (CI) pipeline [4]. A CI pipeline can be described as a set of instructions to automate the process of building and testing the project [5]. The pipeline can include testing the functionality and making sure that all code follows the style policies [5].

This thesis aims to create a workflow for programming assignments based on the commonly used work process; continuous integration. Both to leverage the benefits of automatically test code but also to give students practical experience of CI.

1.1 Background

Some articles argue that the amount of work and time it takes to assess a programming assignment manually is comparable to grade foreign language assignments since there is often a large amount of code that needs to be read, reviewed, and tested manually [1], [6]. However, according to these articles, computer science courses generally have more hand-ins than foreign language courses, which is problematic since each assignment increases the examiners’ workload [1], [6]. However, many programming tasks lend themselves well for automated tests [7].

There are hundreds of available code grading tools designed to provide programming exercises and give automated feedback [2]. Such tools would reduce the workload for the teachers [1], [6]. However, a systematic literature review compared 101 of these tools, and it concluded that a majority give the teachers limited control to adapt the tools to their needs [2], which might not be a bad thing but can be problematic if the tool limits the design of the assignment.

Still, there is also research claiming that automated assessment systems

(7)

increase the students’ interest and that there are statistically significant differences in the scores between experimental and control groups [7].

Software companies also have the problem of a large amount of code needs testing before releasing new changes. According to Ståhl and Bosch [4], many IT companies have adopted an agile work process, in particular methods like Continuous Integration (CI). CI is a software practice where the developers should integrate changes at least daily [4]. Often, the companies use so-called CI pipelines to build and test the source code automatically [5].

Since the agile workflow has become widely used [4], it would arguably be a good thing to integrate it as a natural part of a computer science course.

Implemented correctly, this could benefit both the students, since they will be familiar with the concept after graduating, and the teachers since it will reduce the amount of effort and time spent to grade an assignment [1], [6], [7].

Automating parts of the code assessment would streamline the examining process. By automatically testing the functionality and that the code follows the predefined rules, the examiner could reduce the time spent on testing each assignment and instead could focus on the code review and give better feedback [1].

Ultimately, a code assignment workflow that is inspired by the continuous integration idea and that implements a CI pipeline could lead to a better education since the students will get more practical knowledge. In the meantime, the examiners will get more time to focus on the other parts of a code review that cannot be automated and can use the extra time to improve the course and be more available for tutoring.

The following sections of this chapter describe concepts, roles, and tools that have importance for the design of the workflow and to understand the project.

1.1.1 Version control systems and Git

A vital part of CI is to keep track and integrate changes in a project. It is done by using a Version Control System (VCS), whose primary purpose is to keep track of all files and their history in a project. A commonly used VCS is Git.

VCS:s allows developers to duplicate the source code to make modifications in parallel. Git referer to this action as branching. The parent branch is generally called mainline by VCS and is referred to as the master branch in Git. Integrating the changes from the branch back to the mainline is called

(8)

merging [8].

A Git repository is a virtual storage that tracks all files and the changes in a project [9].

An online source code management (SCM) tool is often used to share the code with everybody involved in a project [10]. By using an online SCM service, the project has a single source of truth and backup [10].

Git requires three steps to share the changes with the rest of the team members. The first one is “add” and is the action to track the changes.

“Commit” is the action to save the tracked changes to the local repository.

While “push” is the action to share the changes to the remote repository [11].

The SCM often provides the service to request to merge one branch into another [11]. Github and Bitbucket call this a pull request, while GitLab calls it a Merge Request (MR) [11]. An MR allows the other team members to do a code review to approve the changes before they are integrated [11]. These requests are often a central part of a CI workflow [12]. The role or MR:s in a CI workflow is explained in more detail in section 1.1.5 GitLab flow.

1.1.2 Continuous integration

To be able to create a programming assignment workflow based on CI it is essential to first understand what CI is.

Martin Fowler [5] describes CI as a software development process that mainly focuses on how to work in a team. The core concept is that each member should integrate, or merge, their changes to the source code at least daily. The rationale is to reduce the impact of integration errors when there are conflicts between the changes different developers have done. By integrating often, the risk of conflicts is smaller, and eventual conflicts are generally quicker to fix.

Often an automated verification system is used to make it possible to integrate the changes frequently. These systems are often referred to as CI pipelines. A CI pipeline is a set of instructions to automate the process of building and testing the artifact. These instructions are called jobs. The repository should include everything needed to build and run the application.

The pipeline runs on every new integration to ensure that the changes do not break the build. The core concepts of continuous integration are that the mainline always should be stable, and the pipeline should provide rapid feedback by informing the developers if the CI pipeline failed or succeeded.

Therefore, the pipeline ideally should be fast to run or be divided into

(9)

multiple steps.

Another quality assurance activity called code review is the practice of manually reading the source code to find mistakes and make sure all requirements are fulfilled [13]. Code reviews are often intertwined with CI, and an exploratory study showed that CI pipelines encouraged code reviews [14]. The activity of doing a code review is similar to the process examiners have while assessing programming assignments.

Even though there are many requirements to fulfill, there are no general rules for how to implement CI in practice, which results in various implementations of the process at different companies [4]. Therefore, this project can not follow a universal formula to ensure that the students’

experience of the suggested programming assignment workflow will be identical to their experience as developers.

1.1.3 Agile project management and Scrum

While CI is a process to catch errors early, it does not solve the problem of managing a project. Therefore, CI can be used as a complement to a project management framework. By including concepts from one of these frameworks might both benefit the students and the design of the workflow.

A popular agile project management framework is called Scrum. Scrum’s idea is to be a structured process and make it easy to see what needs to be done while still being agile. Scrum makes this possible by using a product backlog - a to-do list of requirements - and, to still be agile, work in iterations [3]. These iterations are called sprints and should last a finite time and have a set of tasks, picked from the product backlog, that the developer should implement before the end of the sprint [3]. Optimally, the end of a sprint means the technical and design requirements have been met, the code has been tested, and the application is ready to be deployed [15].

1.1.4 GitLab

There are many different online SCM:s available. Some of them are Github, Bitbucket, and GitLab. These services often provide more features than just backup and being a place to share projects.

According to the team behind GitLab, it is an application that provides a complete continuous integration toolchain [12]. Most of the concepts of Scrum can be converted into features in GitLab. For instance, the items in the backlog are often presented as “issues” and sprints as “milestone” [16]. Many

(10)

of the other services provide similar features, but GitLab is also open-sourced and can be self-managed [12]. Linnaeus University is hosting an instance for its students and personnel. Therefore, the thesis is focusing on GitLab.

The report is using the vocabulary set up by GitLab. Most of them are quite self-explanatory but to reduce the risk of misunderstandings they will be explained here.

A Project is where everything related to a project is stored. This includes the repository where all the code is hosted but also things like settings, team members, and permissions.

Sometimes multiple projects are related in some way and should be grouped. Therefore, GitLab hasGroups that can contain one or more projects or groups, sometimes referred to as Subgroups.

1.1.5 GitLab flow

The tool Git is by itself un-opinionated and does not have any predetermined rules on how to use it [11]. It requires companies using Git to sets up a workflow they determine fit there needs. However, according to GitLab [11], this often leads to over-complicated or not clearly defined strategies. During the years, people have created and adapted multiple suggestions on best-practices and workflows. One of the first workflows created for Git is called Git flow. However, Git flow has its drawbacks. Even though clearly defined, the workflow would still be over-complicated for the vast majority of organizations since the workflow promotes using many different branches with different purposes that solve problems that many companies never have [11].

Git flow uses a develop branch with branches for features, hotfixes, and releases. The team should reserve the master branch for code in production and only merge code that should be released.

The team behind GitLab have presented their workflow called GitLab flow and is a streamlined version of Git flow [11]. As seen in Figure 1.1, GitLab flow promotes using the master branch as mainline, with feature branches where the development happens. The developers then make an MR to the master branch to trigger the CI pipeline’s automated tests. If the changes pass the tests and the code review, they are integrated into the master branch. The main idea is that the master branch should be up-to-date and preferably be ready to be deployed.

GitLab flow recommends creating MR:s in GitLab instead of merging

(11)

directly in the command line. By doing an MR, it serves as a code review tool [11]. When the developer thinks the changes are ready to be integrated, they should assign the MR to someone who knows the codebase well. If the assignee agrees, they will merge the request. Otherwise, they will request more changes or, in some cases, close the MR without merging [11].

Figure 1.1: A graphical representation of how GitLab flow with releases might look like. The master branch is used as the mainline and the other branches are created from master.

GitLab flow promotes creating stable release branches based on the master branch in projects where the software should be released. GitLab recommends creating a tag for each release as well [11]. Tags marks essential points in the repository’s history [17].

GitLab created GitLab flow; therefore, it is reasonable to assume that GitLab has excellent support for the workflow. Furthermore, since GitLab flow is a simplified workflow, which probably is more suitable for beginners, GitLab flow will be the foundation of the assignment workflow.

1.1.6 Linnaeus University

This section is a short presentation about Linnaeus University (LNU) and their current situation and motivation for the new workflow.

LNU is the result of the merger of former Växjö University and the University of Kalmar [18]. The university has a faculty of technology that provides programs in computer science. Two of these programs, Software Development and Operations and Web Development Programme, shares

(12)

multiple courses, which results in many students simultaneously. According to the institute [19], there were 217 people in the first course, Introduction to Programming, and 7.5 credits (1DV021) in 2019. Due to the volume of students, the examiners could theoretically reduce their workload by automatically assess parts of the assignments. 1DV021 has two required assignments that examiners have to assess and grade.

This thesis project is a collaboration with the course coordinators for 1DV021, who strive to prepare the students for the IT industry. Therefore, the course coordinators wanted to set up a CI pipeline to leverage both the perks of having automated tests for coding assignments as well as teaching the students a well-known work process.

1.1.7 Roles

The project focuses mostly on two roles; the students and the examiners.

However, it also mentions teachers and teaching assistants. Therefore, this section will clarify the roles and their differences.

The students refer to the people taking the course and should solve and submit the assignment.

The teacher is responsible for lectures and to teach the students. The examiner, on the other hand, is the one reviewing and grading the submissions. At LNU, the same person is often both a teacher and an examiner, but since that is not always the case, these two roles are separated.

The role of the teaching assistants are often students who help the teachers by, for instance, tutoring other students. In some cases at LNU, they also help the examiners by assessing assignments. They have, however, not the authorization to officially grade submissions. Nevertheless, the examiners can use the teaching assistants’ assessment as a foundation when grading.

While the previously mentioned roles already exist within the academy the last role is an umbrella term for whoever will set up and configure course material and assignments. It can be a teacher, examiner, teaching assistant, or anybody else. However, in this report, they will be referred to as a creator.

1.1.8 Current workflow

This section provides information about how the students and personnel at LNU are currently working, the expectations of what the students should be able to do in their first assignment of the course, as well as what technologies LNU is currently using and teaching.

(13)

Currently, the 1DV021’s course coordinators use a linear workflow for all of their programming assignments. The students get a list of requirements and a deadline. The university is already using Git. Currently, the students are working directly in the master branch and are required to commit at least 20 commits per assignment.

LNU has created a command-line interface (CLI) tool to speed up the process of creating courses and add the registered students to the courses. The CLI tool also ensures that the courses and the assets follow a predefined structure. The command to add registered students to a course runs every hour to automate the process of generating course material for each student.

The CLI tool allows the creator to specify the assignment names. When a new student is added to a class, the corresponding assignments are added as projects in GitLab. These projects are currently empty and require the student to create a new repository.

In the first assignment in 1DV021, the students are using an empty pre-generated repository as their starting point. Therefore, the student has to make a copy of the template project and add it to their assignment repository.

The student submits their assignment for examination by creating a release tag.

The examiners then have to download the submission and run it locally on their computer to check if the requirements are fulfilled and review the code.

A flowchart over the current workflow can be seen in Figure 1.2.

(14)

Figure 1.2: A color-coded flowchart to visualize the current workflow and the responsibilities of each role.

1.2 Related work

The research behind using IT-industry tools for assessing programming assignments are somewhat limited.

(15)

In 2015, Kral and Capek [20] proposed a study on assessing programming assignments using CI tools automatically. They planned to run experiments on students in a programming course for beginners and an advanced Object-oriented course the following semester. However, to date, no such paper has been published.

Verkleij [21] did a study called “Teaching programming using industry tools,” where he tested different tools that the IT-industry is using to see if the equipment fulfills the needs of programming educators. The research was done by first interviewing programming teachers to determine the requirements. Then, Verkleij explored different professional tools to decide the toolchain that should be used in the test. The toolchain was evaluated regarding usability and educational needs. The conclusion was that industry tools could be used to assess programming assignments automatically.

However, the solution has some problems. For instance, it is possible for a student to destroy the toolchain by changing or removing important files in the repository. Verkleij also reached the conclusion that the use of Github and Github Education will limit the workflow to these specific tools since neither GitLab nor BitBucket provides a service similar to Github Education.

1.3 Problem formulation

Universities have rules for how to perform examinations. These rules are most likely to differ, and the university might have local rules and regulations [22]. In this project, LNU’s rules for examinations and the course coordinators’ requirements are referred to as academic assignment requirements.

As previously mentioned, the exact implementation of CI differs between companies; therefore, the goal is to imitate the experience of working with CI and CI’s core concepts. In the case of this project, the word “ imitate” refers to model the workflow on CI but not having to be an exact copy of the work process.

Both the academic assignment requirements and the core concepts of CI are available in chapter 2.

With that being said, the question this thesis is trying to answer is as follows; Can a programming assignment workflow imitate a continuous integration process, including automated tests, and still fulfill the academic assignment requirements?

To answer the question, a workflow, with a CI pipeline fulfilling the

(16)

examiners’ requirements, should be designed and then demonstrated on the course Introduction to Programming (1DV021).

1.5 Objectives

Table 1.1 shows the objectives needed to answer whether a continuous integration workflow would be a good foundation for a code assignment.

The expected result is that there will be some aspects that will not fulfill all the requirements of CI. For instance, a big part of CI is to commit changes at least daily to reduce the risk of merge conflicts. The university can not require when and how often a student should work with their assignments.

Since the students will mostly work alone, they will probably also miss out on the most significant benefit of daily integrations.

Meantime, there are other useful aspects of CI. The main benefit in the case of this project is automated testing, which will help guide the students to implement the assignment correctly and reduce the examiners’ workload.

O1 Research continuous integration O2 Research Git workflows

O3 Confirm the requirements O4 Plan the assignment workflow O5 Communicate the workflow O6 Demonstrate the workflow O7 Evaluate the workflow

Table 1.1. List of the objectives for the project with the corresponding name.

1.6 Scope/Limitation

Since the Linnaeus University has a self-managed instance of GitLab EE version 12.9.1-ee, with an “Ultimate” license, this project will have access to all of GitLab’s features.

The project is limited to follow the guidelines of GitLab flow. The rationale is that GitLab flow is a simplified version of Git flow, and GitLab promotes GitLab flow. Since the tools are from GitLab, it also makes sense to use their promoted workflow.

The result of the project is not only a workflow but also the base for a reusable CI pipeline. However, it will be specific for the first assignment of the course Introduction to Programming, 7.5 credits (1DV021), and will use

(17)

predefined unit tests and linting rules. The assignment is in JavaScript and is using Node.js as the runtime platform. Therefore the CI pipeline is designed to only cater to the needs of a Node.js project.

This project does not include changing the assignment description in any way to make it better fit the mindset of CI.

1.7 Target group

The target groups of this thesis are teachers and examiners in computer science.

1.8 Outline

The following chapter is Method, which will explain how design science is a suitable research method to solve the problem and how it has been used in this project. The chapter also includes all the requirements that the workflow should fulfill.

The chapter Result demonstrates the workflow when used to handle the assignment Descriptive statistics. It also explains the workflow and the motivation behind the design choices made and how the implementation fulfills the requirements.

In Evaluation, the workflow and the demonstration are evaluated to see if the requirements were satisfactory met.

The Discussion is about whether or not the question was answered and if the problem has been solved. The result was also compared to the results of related works.

Finally, the Conclusion summarizes the findings and what could have been done differently to improve the results. The chapter also includes a section for future work.

(18)

2 Method

To be able to answer the question, whether a programming assignment workflow can imitate a continuous integration process and still fulfill the requirements of an academic assignment, the requirements have to be concretized. These requirements can differ somewhat between examiners and courses; therefore, this project will implement the requirements given by the course coordinators of the course Introduction to Programming (1DV021) at Linnaeus University. I will implement the first assignment, “Descriptive statistics”, to test and demonstrate the workflow. The course coordinators and I choose this assignment as it is well suited for CI since the assignment lends itself well for automated testing and has prewritten unit tests that the demonstration is re-using. The assignment also includes code style rules, or linting rules, that should be tested. The word “tests” will, from here forward, refer to both the unit tests and the linting tests for 1DV021. The full assignment description is available in Swedish in Appendix A; however, the essential elements, for this project, have already been mentioned here or will be clarified when needed.

The method for this project is an exploratory process called design science (DS). While many other methodologies try to understand our reality, DS is a research method to create and evaluate artifacts to solve problems [23].

Hevner et al. [24] defined a set of rules to define a well-conducted DS research. The research should create an artifact to solve a yet unsolved but essential problem. The solution must be evaluated and should be based on existing knowledge. Finally, the research and the artifact has to be communicated. Based on these criteria, DS is a well-suited research method for the necessities of this study.

DS is a process of six activities. The first activity is to identify the problem and motivate the need for a solution. The problem and the motivation for the project have already been addressed in the previous chapter.

The second activity is to define the requirements. The workflow requirements are a combination of user stories based on the discussions with the course coordinators, continuous integration, and GitLab Flow. The user stories result from multiple iterations and discussions regarding eventual assignment workflows solutions together with the course coordinators. The requirements are listed in sections 2.1.1 and 2.1.2.

(19)

The third activity is to design and develop the artifact. This project’s goal is not to develop new software but rather a workflow and a set of instructions on how to use available features in GitLab in the context of assessing programming assignments, specifically at Linnaeus University (LNU).

The fourth activity is to demonstrate the artifact and how it can solve the problem. The demonstration is done by following the workflow to implement the first assignment, “Descriptive statistics,” in the course Introduction to Programming (1DV021) at LNU.

The fifth activity is to evaluate the results, both the artifact and the demonstration. How well does the artifact solve the problem, and are all requirements met? The suggested workflow, including the CI pipeline, should fulfill both the user stories as well as follow GitLab flow to be considered successful. To see if the demonstration fulfills all of the requirements it will be tested by me acting in the roles of both student and examiner to verify that all requirements are implemented. The workflow and the demonstration will then be presented for the course coordinators of 1DV021. The evaluation is based on whether all the requirements are met and if the course coordinators approve how the academic assignment’s requirements are fulfilled.

The last activity is to communicate the research, which in the case of this research, is this report. LNU is using their GitLab instance to store and share instructions regarding working with assignments, Git, and GitLab to both students and personnel. Therefore, the artifacts of this project, both instructions, and the demonstration, will be available and communicated through LNU’s GitLab instance as well.

Activities design and development, as well as demonstration, are presented in the result chapter.

2.1.1 The course coordinators’ requirements as user stories

This section lists the academic assignment requirements and describes the process of collecting them.

A user story (US) is a short informal sentence describing one or more requirements in a software system [25]. A US should include "who," "what"

and "why." The "who" should be an end-user of the product. The "what" is the description of what the user wants. Furthermore, the "why" should motivate the benefit of the requirement [25].

As previously mentioned, the US is a product of multiple discussions with the course coordinators of 1DV021. Most of the US were defined early on,

(20)

while others have been added later when it became clear that not all requirements were covered by the US. The US formulations have either been done together with the course coordinators or been approved by them.

US 1 "As a student, I want to be able to submit my assignment explicitly so the examiner will not grade my unfinished solution."

US 2 "As a student, I want to know if my implementation fulfills the tests to get rapid feedback on my submission."

US 3 "As a student, I want to be able to fix eventual errors in my submission before the deadline to submit the best version of my solution."

US 4 "As an examiner, I want to save time assessing assignments by not having to download all submissions and test them manually."

US 5 "As an examiner, I want to be able to give feedback directly in the code to easier point out a specific item to comment"

US 6 “As an examiner, I want to be able to give general feedback regarding the submission to give a summary of the assessment.”

US 7 “As an examiner, I want to be able to inform the student about their grade to let them know about the result of the assignment.”

US 8 “As an examiner, I want to know the time of the submission in case the student sent in the assignment to late.”

US 9 “As an examiner, I want to be able to differentiate the student’s submissions to know which submission should be assessed in case the student makes multiple submissions or retries.”

US 10 “As an examiner, I want to know which submissions are complete in order to avoid spending time assessing submissions that are not ready for examination.”

(21)

US 11 “As a teacher, I want to be able to track the students' progression of an assignment to be able to adjust the course and its planning if needed.”

US 12 “As an examiner, I want to get an overview of all submissions and their grading status to not have to check each assignment one at the time.”

US 13 “As an examiner, I want to be notified when a student has submitted a supplement to not having to check if they are done manually.”

US 14 “As an examiner, I don’t want the students to be able to change the tests so I can be able to trust the results of the tests.”

US 15 “As an examiner, I want a distinct difference between how a failed submission and a submission that needs to be complemented is handled since these are two different scenarios.”

2.1.2 GitLab flow requirements

The requirements are defined based on GitLab’s description of GitLab flow, which has been summarized in section 1.1.4 GitLab flow. There are minor variants of GitLab flow depending on how the project should be published.

The workflow also includes optional steps and recommendations to improve the workflow. However, the requirements for this project are limited to the foundation of a GitLab flow where the project uses release branches.

1. The master branch should be preserved as mainline.

2. When preparing a new release, create a stable branch from master.

3. Use a Git management application, like GitLab, to make MR:s.

4. Each integration should be tested.

5. Each MR should be reviewed by someone knowledgeable of the codebase.

2.2 Reliability and Validity

Flaws in the method might reduce the trustworthiness of the study. It is essential to be proactive and try to find these problems and figure out how to

(22)

minimize their effect on the result. The problems are often classified as reliability and validity problems. Reliability is concerning if the study can be reproduced, while validity is whether the conclusion of the study can be trusted. This section includes possible problems of the study and the actions to reduce their effect.

There are many different types of validity problems. Construction validity is when the reader can misinterpret the problem formulation or the method.

Therefore, it is crucial to clearly define everything that can be misunderstood and affect the validity of the results. In the case of this project, words such as

“imitate” can be too vague and have different meanings to different people.

In an attempt to increase the study’s construction validity, these types of essential words have been defined. However, the definitions might still be unclear, and the validity can be affected.

Another common validity problem, called internal validity, is whether the result and conclusion are backed up by the collected data. For this study, there are eventually one major internal validity problems. The problem being bias. Ultimately, the requirements are defined by the same people that will decide it the workflow fulfills the requirements, which can result in bias.

Either by having external stakeholders who want to have a good workflow and can, therefore, be more critical towards the results, but it can also mean that their need for a new workflow will lead to approving suboptimal solutions. To reduce the risk of bias and to allow the reader to make their conclusion, the solution for each requirement will be tested, evaluated, and presented separately.

The result’s validity could be approved by having one half of the class doing the assignment using the current approach, while the other half would use the suggested workflow. Then both the students and the examiners would be interviewed to determine if they got the expected benefits. Unfortunately, an experiment like that is not possible within the time frame of this project. It also opens up for ethical considerations since it might affect the course’s experience negatively for half of the students.

The choice of doing a study catering to LNU’s needs might also result in external validity problems. External validity is whether the method is general enough to make sure that the results justify the conclusion. As mentioned, the study mostly aims to fulfill the need of LNU, and different countries, universities, or even courses might have other requirements that the proposed workflow can not fulfill.

(23)

However, by focusing on more general requirements, other universities can hopefully benefit from the results, and the conclusion can be more general than just for LNU and the course 1DV021.

Since the method is design science, the result might not be reliable because many factors are in play. If the implementation fails to fulfill all the requirements, it might be because CI is unsuitable for programming assignments. However, it can also be because of the limitations of the chosen technologies or insufficient knowledge by the person that designs and implements the solution. Since the research method is exploratory, it can reduce the validity, since the finite time frame might limit the result. The limitation might effect, both by not having enough time to implement all the requirements but also by not having the time to test different solutions to reach the most optimal result. The reliability might also be affected since the report might not include all paths explored. There has been an active attempt to include all reasoning behind the choices made, but there can still be uncertainties that deflate the project’s reliability. Another aspect that might reduce the reliability is new features and changes to GitLab that might change the project’s result. Therefore, is the project limited to a self-managed instance of GitLab EE version 12.9.1-ee, with an “Ultimate” license.

Defining requirements as user stories is an excellent way to explain the purpose of a feature and who will benefit from it. By not focusing on the specifics of the requirement but instead why it is needed, it allows us to elaborate and find better solutions. However, a downside is that the user stories might be vague and, therefore, be misinterpreted, and the implementation might not fulfill the requirement properly. Having regular meetings with the client and discussing the project and the workflow will hopefully reduce the risk of misunderstandings.

2.4 Ethical Considerations

Except for the course coordinators no other people will be participating in this project. Therefore, there are no ethical considerations to be made. The only considerations that might come into play are to make sure that the students can not see each other's solutions or grades. It can be prevented by using proper permission settings when generating the students’ course content.

(24)

3 Results

The result of the project is two artifacts. The first one being a workflow, including instructions, for how to set up an assignment (see Appendix B), how students should work (see Appendix C), and how the examiners should assess the assignments (see Appendix D).

The second artifact is the demonstration of the workflow by following the instructions and set up the first assignment, “Descriptive statistics,” in LNU’s course 1DV021. This chapter will focus on the demonstration to show the results of the suggested workflow, the motivation behind the decisions made, and how the workflow fulfills the requirements.

User stories will be referred to by US, followed by the user story’s number. For instance, US 1 refers to the user story, “As a student, I want to be able to submit my assignment explicitly so the examiner will not grade my unfinished solution.”

The complete list of user stories is available in the previous chapter, Method, or in section 4.1, where the user story is followed by a short explanation and evaluation of how the implementation meets the requirement.

The workflow has some aspects that must be implemented in a specific way to work properly or to fulfill the requirements. Other aspects are rather recommendations thatshould be implemented to improve the experience for the end-users or to provide a consistent experience for the students if the workflow is used on multiple assignments and courses. The words should and must will be used in the report to differentiate the significance of the recommendation.

3.1 Workflow overview

This section contains a condensed explanation of the suggested workflow. All details, motivation, and explanations will be found in separate sections later in the chapter.

When a student register to a course an assignment project is automatically generated based on an assignment project template. The student’s assignment project is preconfigured with an external CI pipeline, tests, a branch called

“release”, and a merge request template for submitting based on the configurations in the template.

Every time the student is pushing to the GitLab project, the CI pipeline

(25)

will be triggered and evaluate the student’s current progress of the assignment based on the tests in the assignment project. To submit the solution for assessment, the student has to make a Merge Request (MR) to the “release” branch with the milestone matching the submission. The MR will trigger another set of pipeline jobs that will use an external set of tests.

The test results are presented in the MR, and if there are any errors, the student can commit new changes; otherwise, if the student made the MR before the milestone’s due date, the submission has successfully been submitted.

It is possible to perform all the grading directly on GitLab in the student’s MR. Git makes it possible to track any changes, so it is easy to see what the student has done and when. The examiners can give input directly in the code or as comments in the MR. If the assignment has a passing grade, the examiner must express this in the feedback and then merge the MR. If the submission needs complementing, the examiner must change the milestone and either refer to the failed pipeline or list everything missing for a passing grade. When the student has fixed the errors, they must comment in the MR to inform the examiner.

If the examiner fails the submission, they must give feedback as a comment in the MR and close the MR. When the student is ready to make a new submission, they must make a new MR and select one of the assignments retake milestones. A flowchart of the workflow can be seen in Figure 3.1.

(26)

Figure 3.1: A flowchart with color-coded roles to show which role is responsible for each step and decision in the workflow.

(27)

3.2 Create a course

To make sure that all courses follow the same structure and to reduce the time to set up a new class, LNU had already created a CLI tool to generate new courses. I have extended the tool to allow a structure supporting the new workflow. The difference is that the structure now also includes the subgroups “Pipelines” and “Templates”, as seen in Figure 3.2.

Figure 3.2: The default subgroup structure of a course using the workflow.

The “Pipelines” group store all the projects with the CI pipeline and external tests that the assignment needs. For this prototype, there is only one pipeline project, but preferably there should be one project per assignment since the requirements and implementation of the CI pipelines will most likely differ between assignments since different assignments have different sets of tests.

The “Templates” group contains custom project templates. New projects based on a custom project template, copies all the settings from the model and transfers them to the new project. Therefore, these types of templates are valuable to be able to preconfigure the students’ assignment projects.

Since each assignment is different with different requirements, it is hard to generate pipelines and assignment templates automatically. Therefore, these types of artifacts have to be created manually by the creator. Figure 3.3 shows how the course structure on GitLab is after the assignment’s pipeline project and template project are added.

(28)

Figure 3.3: The structure after adding a pipeline project and a template for assignment 1.

3.2.1 Create milestones

As mentioned in section 1.1.3, the developers working with scrum should within the time frame of a sprint, implement the technical and design requirements, test the code and make sure that the application is ready to be deployed [15]. Students solving a programming assignment has similar requirements. They should implement a solution, test the solution, and submit it before the assignment’s due date. Therefore, tools the IT industry uses to handle sprints can, to some extent, be used by examiners to handle assignments.

GitLab provides a feature called “milestones” which purpose is to represent sprints [16]. A milestone can include information about the title, start date, due date, and a description.

For the workflow, the milestones are representing each assignment and retake. In the case of the demonstration, I created three milestones. The first one being the main assignment milestone, the second one is representing the retake and a third milestone for complementing assignments. Each assignment milestone has to have a title and a due date. However, to help the students to plan their workload the creator can also include the start date to inform the students when they should start working on the assignment.

The creator should follow a naming convention for the milestone’s title to make it easier for the students to choose the correct milestone when submitting. The recommended title structure is the name of the assignment followed by the due date in parenthesis, e.g., “Assignment 1, Statistics

(29)

(2020-05-25)”. The reason to include the due date in the title is that GitLab does not show the milestone’s due date in the view of creating an MR. By including the due date in the title it becomes easier for the students to differentiate the milestones and reduces the risk of having students submitting to the wrong assignment milestone. Each retake opportunity title should follow a similar structure but should include the suffix “Retake” as well. E.g. “Assignment 1, Statistics - Retake (2020-06-30)”. The due date in the title will differentiate the retakes. However, sometimes there are only minor mistakes that the student needs to fix and does not classify as a retake.

Therefore, there should also be a milestone called “Complement due date,”

followed by the due date in parenthesis, which should have a due date after the last retake. The amount of time between each retake and the complement due date is dependent on local rules for examination. In the case of LNU, the minimum amount of time before the next retake is ten working days after the student receives the feedback [22]. Figure 3.4 shows how milestones are listed. The list is available to the students too, which allows them to plan their workload and get a clear view of the tasks during the course.

Figure 3.4: The milestone view, listing all assignments and retakes during the course instance.

3.2.2 Create a pipeline project

US 14 says, “As an examiner, I don’t want the students to be able to change the tests so I can be able to trust the results of the tests.” Therefore, it is vital that the examiners can trust the test results to benefit from the automated tests when assessing an assignment. The CI pipeline and the tests must be stored somewhere the students do not have permission to alter them in any way. For that reason, the CI pipeline and the release tests can not be a part of the assignment project but rather be stored in a separate project.

The unit tests and linting rules of the original “Descriptive statistics”

(30)

assignment are used in this implementation as well. The CI pipeline has jobs running the tests. Therefore, both the pipeline project and the assignment project will have copies of the tests, as seen in Figure 3.5. The reasoning behind having two sets of tests is that the student will have access to the tests locally and can verify their implementation before pushing it to GitLab. By providing the tests in the student project, the student will become familiar with workflows using unit tests and, hopefully, a more transparent image of software development. However, this approach also supports “hidden” tests, meaning that the tests in the pipeline project can differ from those provided in the student projects. Still, it is encouraged to keep the pipeline project public to allow the students to see the tests and the CI pipeline, both for the sake of transparency and to allow the students to learn more about the elements of the workflow and CI.

Figure 3.5: The pipeline project contains the CI pipeline and all the tests to assess the submission automatically.

I have configured the CI pipeline to run different jobs if the student has made an MR since it allows the pipeline to use the other set of tests after submission. Code 3.1 shows the rules for the jobs that will run when the student commits, and there is no MR connected to the branch, while Code 3.2 shows the rules that will trigger the jobs that should run when the student makes an MR to the release branch. The CI pipeline is a YAML-file named

“.assignment-1-ci.yml” and is stored in the root of the pipeline project,

“Assignment 1”, as seen in Figure 3.5. The CI pipeline scripts are mostly written in Bash, a command language, to perform actions. The full pipeline is available in Appendix E.

rules:

- if: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME != 'release'

Code 3.1: The rules for the development pipeline to runs for each new push to

(31)

GitLab as long as it is not a new merge request to the “release” branch.

rules:

# Only run if it is an MR to the release branch

- if: $CI_MERGE_REQUEST_ID && $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'release'

Code 3.2: The release pipeline runs when the student creates an MR to the release branch and for each new commit after the fact.

GitLab can show the test results directly in the MR, either failing test, as shown in Figure 3.9 or without errors shown in Figure 3.11. However, it requires that the pipeline saves the test results in a JUnit format as report artifacts [26]. It is possible to present the test results from the development pipeline using JUnit as well, but this feature is disabled by default due to performance issues [26]. Therefore, the test results in the development pipeline will be presented directly in a console in GitLab. By offering the test results for the student, the implementation satisfies US 2.

The release pipeline has an extra job compared to the development pipeline. The job compares the date of the latest commit with the due date of the MR’s milestone. If no milestone is selected or if the due date has passed, the test fails. Otherwise, if the commit date is before or the same as the due date, the job succeeds. The script used in the job has to have access to the course’s GitLab group id and a private access token to get information from the GitLab API. These two variables are not available by default and have to be added manually. To be able to re-use these variables for all assignments, they should be added by an administrator in the root course group’s settings.

The access token only needs to have read-only permission and should be added as a masked variable to ensure that it will not be available in the job logs. As long as the students do not have maintainer permission, the variables will not be accessible for the students. Figure 3.6 shows how the setting page with the variables added looks.

(32)

Figure 3.6: The manually added CI variables used in the check deadline job.

3.2.3 Create a custom project template

The template group contains all templates used in the course. GitLab has a feature called Custom project templates. The project templates make it possible to create a template that is not limited to the information in the repository. It means that it is possible to preconfigure the assignment to use a specific pipeline and already have a release branch. By using a template, the students do not need to configure the project before starting with the assignment. It will hopefully reduce the risk of mistakes resulting in the student fails to submit due to missing a crucial step in the setup or a spelling mistake when creating the release branch rather than failing the assigned task.

GitLab has decided that for a project to be a custom project template, it must be within a group that is set to be used as a custom project template group. Therefore, the Templates group should be assigned to be such a group.

Assigning a group to be a custom project template group has to be done manually for each new course since GitLab’s API does currently not support assigning the group programmatically. GitLab will only treat projects inside of the Template group as a custom project template, therefore, projects outside of the group or in a subgroup will not be possible to use as a template [27]. These templates will not be available for other courses. It is, however, possible to have custom project templates available everywhere in the GitLab instance using “Instance templates.” Nevertheless, because most of the courses have specific assignments, most of the templates are course-specific.

Therefore, in this case, group templates are a better solution.

The “Descriptive statistics” assignment is designed to give the student the foundation of the application already set up, and the student must fill in the gaps. Therefore, the old assignment template is used as the base for the new assignment template. However, the new template is also connected to the

(33)

external CI pipeline, has an MR template, and has a new branch called

“release.” The MR template, see Appendix F, is a document that the student must fill in when submitting their assignment. The predefined “release”

branch is a way to reduce the risk that the student misspells the branch name and therefore fails to submit the assignment correctly.

3.2.4 Generate assignment projects for each student

The CLI tool has a command for adding students to a course. I have altered the tool to now also generate students' assignment projects from a template.

Each registered student will have a copy of the Assignment 1 template, including being configured to use the pipeline in the pipeline project.

A custom project template does not transfer the merge approval rules.

Therefore, the script also configures the assignment project's merge approval only to allow GitLab users that are members of the course root group will have permission to pass MR:s. The CLI tool adds all users that are examiners and teachers of a course to the course’s root group. The approval rules are essential to make sure that the students can not approve their own submissions and only personnel has the permission to approve the MRs.

3.2.5 Add Runner

GitLab is using something called Runners to run the specified jobs in the CI pipeline. GitLab.com has multiple Runners configured already. However, self-hosted instances of GitLab, as LNU uses, also require self-hosted Runners. According to GitLab, the runner has to be on a separate server than the GitLab instance. Therefore, another instance on the university’s cloud has been set up for hosting the runner.

Runners can either be dedicated to a single project or group, including subgroups or be available for all projects, so-called shared runners [28]. In contrast to the other runners, a shared runner is using a fair usage queue to process the CI pipeline jobs. The fair usage queue makes sure that not just one project hijacks the runner [28]. Due to the queuing system and the possibility to use the runner in other courses, a shared runner is used.

However, the runners can only run one job at a time, which means that depending on the number of students, multiple runners are needed to reduce the time queuing to run the pipeline. Since this project is a prototype, only one runner is in use.

(34)

3.3 Student workflow

Since the student already has an assignment project set up at the beginning of the course, they only have to clone, or download using Git, the repository to work on the assignment locally.

To not make the process too advanced for beginners, as seen in Figure 3.7, there is no requirement for the students to use multiple branches while working on the assignment. However, the students still have to commit regularly and write descriptive commit messages since it is one of the requirements of the “Descriptive statistics” assignment. The commit history is available in GitLab and the examiners must check the commit messages while assessing the assignment.

Figure 3.7: A branch diagram of the suggested Git workflow. The workflow does not require the student to use any feature branches as GitLab flow does.

The main differences from the previous assignment workflow, explained in section 1.1.6, are the submission process and that the tests run automatically when the student pushes the changes to GitLab. As seen in Figure 3.8, the results of the CI pipeline jobs are presented in GitLab.

(35)

Figure 3.8: A failed development pipeline after the student pushed the commit “Add maximum.”

3.3.1 Submit the assignment

When it is time to submit the assignment, the student has to make an MR to the “release” branch in their assignment project on GitLab. The student will select the branch they want to submit, probably master, as the source branch, and then the “release” branch as the target branch.

The MR title should be “Submit:” followed by the name of the assignment. In the case of the demonstration, the title is “Submit: Assignment 1”. The naming convention will differentiate submissions from regular MR:s as well as make it easier for the examiner to get an overview of the submissions. Misspelled submissions are still valid, as long as the examiner deems it to be an apparent submission.

Making an MR is an explicit action and therefore fulfills US 1.

Nevertheless, to reduce any risks of cases where the student mistakenly has submitted, the release template must be used and filled in by the student. In the template, the student has to confirm that the assignment is complete, as in US 10, and they have done the assignment alone. In the last step, before submitting the assignment, the student has to select the milestone corresponding to the assignment. The milestone fulfills multiple purposes. It categorizes the MR, which makes it clear that the MR is a submission and helps the examiner filter the submissions when assessing. The milestone also makes it possible for the CI pipeline to check if the student submitted the assignment in time by comparing the date with the milestone due date.

When the student submits, the release jobs in Code 3.2 are triggered, and the implementation fulfills US 2 by showing the test results in the MR. If the

(36)

tests fail, as in Figure 3.9, the student has until the deadline to fix these issues by committing changes to the source branch. The student does not have to make a new MR since the new commits will be included in the current MR automatically. As seen in Figure 3.9, more information is presented about the error if the student clicks on the error.

If the student fails to submit in time, they must change the milestone to be the next retake for the assignment.

Figure 3.9: The student’s view of an MR with failed test results after submitting before the deadline. The examiner’s view is identical but they also have the

possibility to approve the MR.

(37)

Figure 3.10: Detailed information about one of the failing tests.

Figure 3.11 shows how it looks if the tests succeed. It means that all the student has to do now is to wait for the examiner to do a code review, give feedback, and grade the assignment. Just because the tests are successful does not mean that the student is guaranteed a passing grade. It is ultimately up to the examiner to decide.

Figure 3.11: An extract from an MR where all tests pass.

If a student needs to complement their submission, the MR will have a list of items to fix to get a passing grade. To add the changes, the student has to commit to the source branch, and the MR will be updated as well. As requested in US 13, the student informs the examiner that the issues are fixed by commenting in the MR.

In case of a fail, the student has to do a new MR and use the appropriate retake milestone.

When the student gets a passing grade, the student will receive an email saying the examiner has approved and merged the MR. The MR will be updated, saying the examiner has merged it, see Figure 3.12, and classified as

(38)

merged rather than open. The examiner will also have given feedback, including the grade, as a comment in the MR. The student’s different end states are visualized as a flowchart in Figure 3.13.

Figure 3.12: The examiner will approve and merge the MR if the submission has a passing grade.

Figure 3.13: A flowchart showing the different possible end states for a student.

3.4 Examiner workflow

When it is time to assess the assignments, the examiner can find all the submissions listed under the "Merge Requests" tab in the course group. As

(39)

seen in Figure 3.14, each list item contains information about who made the MR, the result of the CI pipeline, when the MR was created, when it was updated, and the milestone. All the MR:s in the "Open" tab is either not yet assessed or not approved. The MR:s under the "Merge" tab has, on the other hand, an examiner already approved and merged. The information in the list view helps to solve US 8, US 10, and US 12.

Figure 3.14: The list of submitted assignments.

Different examiners might have different preferences on how to divide the work between them, but the recommended way is to start with the oldest MR and work toward the newest. To make sure that not multiple examiners are assessing the same MR, the examiner should begin the assessment by assigning the MR to themself. As seen in Figure 3.13, the assigned examiner's avatar will appear next to the pipeline status in the list view and signal to the rest of the examiners that the MR is already taken.

Figure 3.14: The examiner´s avatar shows that the assignment is under assessment.

The other examiners should instead assess the next available submission.

The MR presents an overview over commits, the results of the pipeline's release jobs, and the changes made in the source code.

As seen in Figure 3.15, US 5 is solved by the fact that GitLab has the feature to give feedback directly connected to any line in the repository. The feedback is then added as comments in the MR’s overview. When the code review is finished, the examiner must give some general feedback and a grade. It is done by commenting on the MR. If the student needs to complement the assignment, the examiner must list all issues in the comment

(40)

or refer to the pipeline results, and change the milestone to the “complement”

milestone and inform the student to comment when they have fixed all the issues.

Figure 3.15: During the code review, the examiner can give specific feedback for any line.

In case of a failure, the examiner must close the MR after explaining how the student should proceed. The recommended way is that the student has to do a new MR with the appropriate milestone assigned when they have completed the assignment. However, it might differ depending on local rules or if, for instance, a new assignment has to be solved instead.

If the student passed, this must be clearly stated in a comment together with the grade, and the MR must be approved and merged. A flowchart of the examiner’s decisions is available in Figure 3.16.

(41)

Figure 3.16: A visualization of the examiner’s possible decisions while grading a submission.

3.4.1 Teaching assistants

The CLI tool requires a configuration file where the usernames of the teachers and the teaching assistants are in two separate lists. Only the users in the teacher list will have permission to approve the MR, but there are cases where teaching assistants are helping assess the assignment. On these occasions, the teaching assistants should still comment and give their recommended grade. However, instead of closing, re-assign the milestone or approving the MR themself, they must assign one of the examiners. The examiner can then verify the assessment and then either agree with the teaching assistant and finalize the grading or add a comment to explain why the assessment was wrong and how the student must proceed.

Continuous integration pipelines to assess programming assignments

Bachelor Degree Project