Test Framework Quality Assurance: Augmenting Agile Processes with Safety Standards

(1)

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science in Engineering

-Dependable Systems 30.0 credits

TEST FRAMEWORK

QUALITY ASSURANCE:

AUGMENTING AGILE PROCESSES

WITH SAFETY STANDARDS

Jonathan Th¨

orn

jtn14004@student.mdh.se

Examiner: Wasif Afzal

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisor: Daniel Sundmark

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Per Erik Strandberg,

Westermo Network Technologies AB, V¨

aster˚

as

May 19, 2020

(2)

Abstract

Quality of embedded systems is often demonstrated by performed tests and guaranteed by the quality of the tools used to perform them. Test automation is important in agile development and test frameworks can be considered mission-critical. Thus, it is important to ensure the quality of tools used for quality assurance.

This thesis explores how industries with agile processes can learn from safety-related develop-ment with plan-driven processes for increased test framework quality. Safety standards often rely on plan-driven processes, focused on discipline in long term prospects with substantial documenta-tion and extensive upfront plans and designs. Agile approaches instead focus on quick adaptadocumenta-tion, where software is evolved, undergoes continuous improvements and is delivered incrementally.

A case study was performed as an industry collaboration. A literature study extracted approaches from articles and safety standards. Analysis and processing resulted in candidate solutions, prin-ciples and practices iteratively refined for general applicability and the industrial context. Insights on implications and perceived industrial value resulted from a focus group, with qualitative and quantitative data collected through moderated group discussions and complementary activities.

Finally, this thesis proposes guidelines intended to be generally applicable, with a suggested augmented agile process of sequential ”mini V-models” inherently controlled by Definition of Dones. A case-specific set of proposed guidelines extends the suggestion while embracing insights from the focus group.

Also identified was the importance of perceiving the framework as a tool-chain and not a single tool, where interaction sequences and intermediate results can be identified and utilized for anal-ysis and applicable measures. Future work could refine the proposed guidelines with an industrial dynamic validation, and also extend the literature study and expand the focus group for diverse contexts and industrial perspectives.

(3)

Acknowledgements

This thesis concludes my studies for a M.Sc.Eng. in Dependable Systems at M¨alardalen University. The described study was performed together with Westermo Network Technologies AB in V¨aster˚as 2020.

I would like to thank my supervisor and colleague, Ph.Lic. Per Erik Strandberg, for his su-pervision, informed suggestions, and all the interesting discussions we have had. He always made himself available when I was in doubt. I would also like to thank my supervisor, Professor Daniel Sundmark, for all the time invested in me, the comments I received, and for steering me in the right direction when I needed it. You have both provided me with a lot of new knowledge and helped me when my demands on myself became too high.

I would also like to thank Westermo for providing the possibility to conduct this thesis and for making me feel welcome and included in the team. With a special thanks to all of you who participated in the focus group, without your commitment and input the study could not have been successfully completed. Thank you also Petra Wernkvist, my manager both before, during and after the thesis, for providing me with interesting opportunities.

I also extend a big thank you to my family for all the support I received when working on this thesis, and also during the six years of hard studies that have taken me here. Countless thanks to my wife Annelie who has always been there for me, without you I would have given up a long time ago. Also deserving a special thank you is my son Melvin, who so generously lent me his room and big desk for the time I needed to work from home, and my daughter Nova, who has endured so many missed bedtime stories. To all of you, I am sorry for being even grumpier than normal in the past months.

Thanks also to my class of 2020 for all the things we experienced together, especially ”Study-Group Elite”, you know who you are. I guess we finally made it.

(4)

Acronyms

ASIL Automotive Safety Integrity Level. 8, 23

BI Basic Integrity. 7,8

CI Continuous Integration. 1, 39,48

COTS Commercial Of The Shelf. 1,7

DAL Development Assurance Level. 8,9

DoD Definition of Done. iv,v,2,36,39,43–45, 47–51,53–57

DUT Device Under Test. 12,38

EASA European Aviation Safety Agency. 7,8

FAA Federal Aviation Administration. 7

FMEA Failure Mode and Effect Analysis. 18,28,35,36

FTA Fault Tree Analysis. 18,28

GUI Graphical User Interface. 21,28

HAZOP Hazard and Operability study. 23,28

ISTQB International Software Testing Qualifications Board. 11,12

SEooC Safety Element out of Context. 11

SIL Safety Integrity Level. 1,7,8,18,19,23,29

TAF Test Automation Framework. 12

TCL Tool Confidence Level. 9,11,23,27

TD Tool error Detection. 9,11

TI Tool Impact. 9

TQL Tool Qualification Level. 9

(5)

List of Figures

1 A conceptual V-model illustration. . . 4

2 Illustration of an agile board. . . 6

3 Simplified illustration of a test automation architecture. . . 12

4 Visualization of the process workflow. . . 13

5 Conceptual visualization of a framework tool-chain model. . . 27

6 A suggested mini V-model controlled by DoDs. . . 44

7 Simplified development trace of V-augmented agile development. . . 44

8 Worst-case relation and dependency between DoD activities. . . 49

List of Tables

1 Offline tool classes according to IEC 61508, EN 50128, and EN 50657. . . 9

2 Tool confidence level determination according to ISO 26262. . . 9

3 DO-330 tool criteria definitions and tool qualification levels. . . 10

4 Summary of directly extracted approaches according to C1. . . 20

5 Summary of information and created approaches according to C2. . . 22

6 Summary of extracted approaches for tool confidence from standards. . . 24

7 List of initial candidate solutions. Details available in Sections 5.2.1 to 5.2.5. . . . 30

8 Candidates further processed and refined for the focus group. . . 34

9 Results of the focus group quantitative appraisal. . . 39

10 Quantitative appraisal aspect distribution. . . 40

11 Additional candidates identified by focus group. . . 40

(7)

1 Introduction

The basis of this thesis is derived from the recursive problem: How can the quality of a tool used for quality assurance in agile processes be assured? Claims of high quality impose high de-mands on the quality assurance process and the tools and frameworks used. Testing is important for the development process and the most used quality assurance activity in software develop-ment [1]. To increase efficiency and reliability, execution of tests can be automated and facilitated by purpose-built frameworks and tools [1,2,3]. Since decisions taken during development rely on the correctness of, and confidence in, the produced test results, mission criticality is imposed on the frameworks and tools used for the execution of automated tests. Test automation migrates test execution from humans to software and is a crucial factor for successful implementation of Contin-uous Integration (CI). It is, therefore, an important factor in agile development as the enabler of fast feedback to developers and stakeholders.[1] Unreliable tests due to deficient quality of a test system have a direct effect on developers ability to receive feedback, which in turn impedes the development process [4]. A software development tool can be a complete program, or a functional part of a program, that is used to aid the development of another program [5]. These may be Commercial Of The Shelf (COTS) products or purpose-built to provide means for e.g verification, control and tests [3,6]. A test framework is in this case a software development tool for automated software testing. This contains testware with software, documentation, test cases, test data and test environments, which may include physical test-systems that run the software under test [3,7]. Identified risks to a case-specific framework were considered generally applicable and were related to masking of problems from detection, erroneous test-system hardware configurations and omitted feedback on failed tests.

The concept of functional safety is about freedom from unacceptable risks and protection against human errors, hardware failures and environmental factors. It involves the identification of pos-sible failures and assigning a tolerance to those [8]. Standards for functional safety often rely on plan-driven processes with a sequential flow through predefined phases. Substantial amounts of documentation and validated artefacts from conducted activities are used as arguments and ev-idence that the system was built correctly. Standards used in the transportation domain1 _were

identified as state of practice and include guidelines on software development tools, addressing the concepts of tool qualification and certification [9,10]. Available research on these concepts mainly proposes different methods to utilize the guidelines for qualification of tools in different contexts. Most identified research also request or suggest improvements and identify issues with possible interpretations of the guidelines. Further, the amount of practical guidance provided by the stan-dards is perceived as varying, loopholes have been detected related to classification of tools, and a lack of common approaches applicable across standards have been emphasised.

Agile and plan-driven development approaches have historically been seen as each other’s coun-terparts. Plan-driven approaches are focused on discipline in long term prospects and agile ap-proaches on improvising and using history to adapt to new environments and opportunities. Agile approaches are based on a model where software is evolved and continuously delivered through short iterative cycles. Therefore, the extensive upfront plans, designs and documentation related to plan-driven development are not considered as valuable [11,12]. Available research on combining agile and plan-driven methods is mostly from the perspective of utilizing agile practices in existing plan-driven processes. Implementing plan-driven practices in agile processes to increase confidence and quality is on the other hand identified as a relatively unexplored perspective.

The performed research aimed to find efficient ways for non-safety related development with agile processes to be inspired by safety-related development with plan-driven processes to develop reliable frameworks for software testing. Thus, the research assumes that strategies for increased confidence and quality regarding tools used for automated software testing in non-safety develop-ment can be found or created from concepts and strategies related to safety-critical developdevelop-ment, while maintaining agile and efficient processes.

This has potential value in industrial contexts not compelled by compliance to any safety standard entailing Safety Integrity Level (SIL) classification of products or the tools used to assure them. But instead utilizing recommendations present in standards or approaches derived from means of functional safety in a practical way to assure test frameworks can be trusted as a guarantee

(8)

of product quality.

The study was an industry collaboration with a company developing different devices for robust data communication, such as e.g. robust Ethernet switches. Each device is an embedded system running a software solution also developed by the industry partner. Both manual and automated tests are performed by a purpose-built test framework with testware and several physical test-systems, which is developed in-house according to the agile Kanban methodology.

The research was performed as an academic case study validated against industrial needs and based on a model for technology transfer proposed by Gorschek et al. [13]. The proposed model was instantiated into a suitable customized process consisting of the six steps: (i) Problem formulation, (ii) literature study, (iii) candidate solutions, (iv) candidate validation and refinement, (v) focus group, and (vi) technology transfer. Problem formulation was conducted in collaboration with in-dustry to identify an area with potential for improvement based on inin-dustry needs. The literature study aimed at identifying how quality assurance of software development tools is performed with regards to functional safety development, as well as applied methodologies in agile or mixed devel-opment philosophies. Inspired by guidelines for snowballing in literature, proposed by Wohlin [14], approaches were extracted from both relevant published research and also from three safety stan-dards identified as state of practice. The extracted approaches and additional knowledge gained from the literature study were further processed and analysed into a compiled set of candidate solutions. A candidate solution is a principle or practice for increasing quality of and confidence in an automated software test framework. The candidates were iteratively validated and refined both to suit the industrial context of an intended application and to increase general applicability. The focus group aimed to derive case-specific implications from the proposed candidate solutions and to gain feedback on the industry’s perception of them. Further, the focus group was also aimed at deciding on the feasibility of the different candidates and to prioritize between them. Finally, the technology transfer was dedicated to a proof of concept implementation utilizing previously gained results and aimed at presenting a suggestion with practical applicability, which resulted in a set of proposed development guidelines.

The key findings resulting from the performed study can be identified as: (i) Identified and constructed candidate solutions and the related insights derived from the extracted approaches. (ii) The implications of the candidates and their perceived value in industry, resulting from the focus group. (iii) Proposed guidelines which augment the agile process with plan-driven elements, also further extended to a case-specific solution by instantiating candidates to a set of Definition of Done (DoD).

The candidates provide practical guidance by proposing activities and measures for increased tool quality. Processing of extracted approaches also identified the importance of perspective regarding the proposed candidates. This basis regarded not perceiving the test framework as one tool, but as a sequential tool-chain of several inherent tools, where interaction sequences can be identified for analysis and applicable measures. In relation to standards, sub-tools can be classified on an individual basis and confidence argued as the sum of applied measures throughout the tool-chain. The focus group results provide valuable insights on the implications of the candidates and their perceived value in an industrial context. This gives an indication of aspects considered important in an agile context, and thus where initial efforts should be placed. Insights gained by the focus group may also provide a good basis for further discussions and research on the implications of candidates in other contexts. As a suggested technology transfer, the study proposes a set of development guidelines. A suggested solution with both general and case-specific applicability, indicating on a possible practical application of the candidates, while embracing insights gained from the focus group.

The report is structured as follows. Introduction including problem formulation and research questions is presented in Section 1, followed by the background and related work combined in Sec-tion 2. The overall methodology and a presentaSec-tion of the studied case are presented in SecSec-tion 3. Section 4 presents the literature study. Section 5 presents the processes related to candidate solu-tions. Section 6 presents the conducted focus group. Section 7 presents the proposed guidelines as a suggestion for technology transfer. Sections 4 - 7 all have dedicated segments for method, result, and threats and limitations related to their specific area. Discussion and future work are presented in Section 8, followed by conclusions presented in Section 9. A list of references is presented at the end.

(9)

1.1 Problem Formulation

The quality of embedded systems, both the software solution and the hardware platform, are often demonstrated by the results of performed tests and guaranteed by the quality of the solution used to perform them. Frameworks for software testing can also be considered mission-critical, since development decisions rely on the correctness of and confidence in the produced results. This imposes demands that the quality of any test framework, testware and executed tests is sufficiently assured. Together with the industry partner and for the case-specific framework five risks were identified, which should also be generally applicable to other specific test framework implementations. These potential risks to the quality and reliability of the produced results from a test framework that should be mitigated were identified as:

1. Deficient quality in the software test framework, that may potentially lead to masking of problems from detection.

2. Deficient quality in the test cases, that may potentially lead to problematic areas not being tested at all.

3. Test cases not executed due to e.g erroneous test-system hardware configuration or represen-tation.

4. Missing test cases caused by e.g insufficient requirements.

5. Feedback on failed test cases does not reach the relevant stakeholder.

Risks 1, 3 and 5 can be seen as directly related to the test framework and testware, while risks 2 and 4 are related to the construction of test cases for the software solution that is to be tested by the framework. The study primarily focuses on addressing identified risk 1 in the above list. An underlying assumption is that; strategies for increased confidence and quality regarding tools used for automated software testing in non-safety development can be found or created from concepts and strategies related to safety-critical development, while maintaining agile and efficient processes.

1.1.1 Research Questions

In the context of industries with agile development processes and not bound to comply with a safety standard:

RQ1: Based on approaches proposed in relation to relevant safety standards, what strategies for increased confidence in software tools can be found or constructed?

RQ2: Which of the above strategies are applicable and practical regarding quality assurance of frameworks for automated software testing?

(10)

2 Background and Related Work

This section provides state of the art, state of practice, and work related to the problem of quality assuring software development tools. Concepts and definitions that are relevant for understanding what is presented in the following sections, and the work performed, are also introduced. Some of the related work presented here, especially that presented in Sections 2.3.3 and 2.1.3, is cov-ered in more detail in relation to the performed literature study presented in Section 4. Due to interdependence, some publications were treated as both related work and as part of the literature study. Thus, for applicable publications in the mentioned sections, important aspects and insights not brought up in relation to the performed literature study are summarized here. Motivations for the industrial safety standards included for survey during the literature study are presented in Section 2.3.

2.1 Development Philosophies

Agile and plan-driven development approaches have historically been seen as each other’s coun-terparts. Where plan-driven approaches are focused on discipline in long term prospects and agile approaches on improvising and using history to adapt to new environments and opportunities [11]. 2.1.1 Plan-driven Development

Plan-driven development is a formal and specific approach for development of, for example, soft-ware. Based on a well-defined incremental process to achieve a predictable result, great emphasis is placed on risk management and validation and verification. Detailed plans, up-front system archi-tecture, extensive documentation and continuous monitoring of the process are important factors. Plans are generally created by breaking down a project into stages and then further decomposing these into their inherent activities. Even before any construction has begun, the properties of the final product are known and can be precisely defined, which entails that projects can be planned in detail for the entire process. Further, roles are defined to each member of the team where it is often required that certain roles are executed independent of each other as a factor of con-trol [15,16]. Standards for functional safety, such as those presented in Section 2.3, assume the use of documentation-heavy plan-driven processes where development is a strictly sequential process through predefined phases [9]. Popular among safety standards is to describe this sequential flow through the development life cycle with what is called the V-model [10]. A conceptual illustration of the V-model process based on a combination of standard definitions can be seen in Figure 1.

Figure 1: A conceptual V-model illustration.

This model has similarities to a waterfall structure, in which development activities flow from top to bottom. Perceiving the V-model as a waterfall with the second half bent upwards would however omit seeing the basis on levels that the model represents, and thereby also the entire idea with the model. Activities on both sides that are horizontally parallel to each other form a level and have a controlling relationship. Levels in this case most commonly refer to system

(11)

levels, closer to the center of the V are lower levels. Time can be seen as flowing from top-left to top-right, where all activities depend on documentation and artefacts produced during the preceding activity. For each defined level there are corresponding activities on each side of the V that separates control activities needed to prevent defects at a specific level. The levels represent system levels that may have different problems and therefore require different types of control and analysis measures. On the left side of the V, elicited requirements are decomposed and allocated to lower levels and validated against higher levels. At the right side, lower level items are integrated to higher-level subsystems, and finally the top-level system, with verification against corresponding activities on the same level at the left side. From a software development perspective, the actual implementation take place at the bottom part of the V [6, 17]. Although they are derived from the same concept, the actual models defined in different standards provide different descriptions of the levels and inherent activities to be performed. There also exists double V-models, and triple V-models where software and hardware are developed according to individual V:s as inherent processes of a system-level V, e.g. [18]. Common for plan-driven models and methods are however the production of documentation and validated artefacts from all activities at each level that are finally used to argue and provide evidence that the system was built correct [10].

2.1.2 Agile Development

In contrast to plan-driven development, agile, also sometimes referred to as flexible, development does not rely on high degrees of documentation and rigid processes as the basis for the develop-ment life-cycle. Instead, agile approaches are based on a model where software is evolved and continuously delivered through short iterative cycles with continuous feedback. To embrace, value, and respond to changes, flexibility and responsiveness are important aspects. Therefore, extensive upfront plans and designs are not considered as valuable. Working software adding concrete value to the product, and ultimately the customer and the business, is higher regarded than comprehen-sive documentation. Important aspects of agile approaches are continuous improvements and code integration, resulting in continuous delivery. Rituals like daily stand-up meetings, demonstrations and reflections provide progress tracking, feedback and process improvements. [12, 19]. Having minimal formal processes and ability to adapt also the development processes as needed are also elements of any agile approach. Furthermore, practices where requirements can be altered at any moment throughout the development process, are encouraged. Continuous feedback from stake-holders and customers are promoted by active involvement in the development [19]. Thus, value is delivered to the customer in small increments, in contrast to the single delivery in plan-driven de-velopment. In the context of an in-house software development tool, the customer can be regarded as internal, i.e. the team that utilizes the tool and benefits from the functionality it provides. The terminology used in agile development includes many concept. Among the more commonly used are user stories, use cases and scenarios. User stories are used to elicit requirements from user and stakeholder perspectives. Use cases describe behaviour in a more technical way than user stories, and scenarios describe user interactions from the perspective of a specific system context [20].

There exists many different methodologies to implement the agile philosophy, many of which can be derived from the agile manifesto published in 2001 [21]. Among the best-known methodologies are Scrum and Kanban, which are similar but still different. Both have a product owner, a prioritized product backlog, and utilizes an agile board to track progress. The scrum-board, or kanban-board respectively, has columns representing phases that an item from the product backlog traverse through before being packaged for release, such as e.g. build, test and done as illustrated in Figure 2. The scrum master, or agile coach in the case of Kanban, helps steer development and maintain good routines, and both practice daily stand-up meetings and demonstrations. What differentiates the methodologies are the events occurring between the product backlog and the customer [15,22,23].

In Scrum, development is conducted as a series of time-boxed development efforts know as sprints, normally 1-4 weeks in length. During planning at the beginning of each sprint, high prioritized items from the product backlog are selected according to what is believed feasible to deliver during the time-frame of the sprint. Therefore, a sprint-backlog column is added to the agile board in Scrum for holding the selected items. These items are the only items worked on during the time of the sprint, and any new requirements will not be handled until subsequent

(12)

sprints. During the time-boxed effort, the items progress through the board and at the end of the sprint, all those completed are packaged and released. Any incomplete items are returned to the product backlog. The sprint is finalized with a sprint-review to demonstrate the outcome and a sprint-retrospective to evaluate the performed effort and improve the next.

Figure 2: Illustration of an agile board.

Kanban has no equivalent of sprints and therefore no time-boxing or additional backlog. It is instead a continuous process, limited by a maximum allowed work-in-progress items for each column on the agile board. This limit is related to the team capacity and acts as a trigger for pulling new items from one column to the next, traversing through the board. Fewer items than the limit in the first column triggers selection of a new high prioritized item from the product backlog. When done items are to be packaged and released are decided by the team [15, 22,23]. Kanban can, therefore, be seen as less process-heavy and more flexible when compared to Scrum. 2.1.3 Attempted Combinations of Agile and Plan-driven Philosophies

There exist several publications studying approaches, challenges, and impediments related to how plan-driven and agile methods can be combined. In ”Challenges in Flexible Safety-Critical Soft-ware Development – An Industrial Qualitative Survey”, Notander et al. [24] conclude that agile development can co-exist with plan-driven development provided that identified challenges are addressed. This publication is also further evaluated in Section 4.2. Heeager identifies nine prac-tice areas of meshing methods from the different development processes in ”How can agile and documentation-driven methods be meshed in practice?” [25]. These areas are management strat-egy, customer relations, people-issues, documentation, requirements, development stratstrat-egy, com-munication and knowledge sharing, testing, and culture. The authors claim a contribution to six areas by identifying a level of difficulty in combining each area. Documentation is determined to be the hardest, while requirements, testing and customer relations is considered difficult to combine. Development strategy, and communication and knowledge sharing were found to be combinable without impeding challenges. In the later publication ”Meshing agile and plan-driven develop-ment in safety-critical software: a case study”, Heeager et al. [26] focuses on the four areas of documentation, requirements, life-cycle, and testing. Challenges and proposed approaches related to these areas are identified to enable understanding of possibilities and difficulties in performing safety-critical software development using agile methods. In ”An Assessment of Avionics Software Development Practice: Justifications for an Agile Development Process”, Hanssen et al. [27] present an approach outline for extending agile methods, in particular Scrum, to achieve the objectives of DO-178C2_{. The main idea is a distribution of the DO-178C process steps as sprints with the}

sequenced Scrum phases: preparation, development, and closure. With each sprint having inputs, activities, and outputs related to what is required by DO-178C. All outputs from each sprint being verified and then used as input to subsequent sprints. It is concluded that extensive investments in having sufficiently detailed high-level requirements before starting the sprint-based development are a necessity for the approach to be applicable. In ”Seeking Technical Debt in Critical Software Development Projects: An Exploratory Field Study”, Ghanbari [28] suggest that accumulated

(13)

technical debt can be identified and managed, or even avoided, by utilizing agile practices in crit-ical plan-driven software development. The authors identify that debt caused by e.g. requirement ambiguity, diversity of projects, inadequate knowledge management, and resource constraints may be mitigated by applying common agile practices such as small releases with continuous testing, iterative development, burndown charts and backlogs, and stand-up and review meetings.

Common for all identified publications on the subject of combining agile and plan-driven meth-ods is the perspective of utilizing agile practices into an already existing plan-driven development process. Efforts made were unable to find any studies with the inverted scenario of implementing plan-driven practices into an agile development process with the objective to increase confidence and quality of the produced product or applied processes and methods.

2.2 Software Development Tools

A software development tool can be a complete program, or a functional part of a program, that is used to aid the development of another program. Tools can be used for e.g. development, modelling, tests, analysis, production, or modification. Performed in the context of the actual program, data processed in the program or documentation regarding the development of the program [5]. Typical development efforts utilize a vast variety of different tools at different stages and for different purposes, many of which are Commercial Of The Shelf (COTS) products that support several different use cases. Examples of such tools are compilers, debuggers, modelling tools, code generators, and tools used for configuration management [29,30]. Tools may also be purpose-built to perform specific tasks or aid the development in specific ways, such as providing means for e.g. verification, tests, control and monitoring [3,6].

2.3 Industrial State of Practice

There exists a vast amount of industrial standards, making a survey of them all an infeasible task for the limited scope of this thesis. Within the transportation domain, there are standards for functional safety, considered high profile, that addresses and includes guidelines on the usage of tools during development [6,10,24,29,31,32,33].

IEC 61508:2010 [34] “Functional safety of electrical/electronic/programmable electronic safety related systems”, is a generic industrial standard covering the lifecycle activities for systems com-prised of components that perform safety related functions. The standard can be used in its original state for development of any electrically-based safety-related system, but also serves as a template that facilitates the development of international domain-specific functional safety standards.

ISO 26262:2018 [18] ”Road vehicles - Functional safety”, is the domain-specific adaption of IEC 61508 for electrical and/or electronic systems within road vehicles. It is emphasised that since the second edition is fairly new, several of the included articles in studied literature will be based on the first edition ISO 26262:2011 [35]. However, any approaches based on the first edition are to be extracted based on their ability to increase quality and confidence in the usage of tools, and the second edition parts related to tools are also surveyed individually as part of the literature study. EN 50128:2011 [36] ”Railway applications Communication, signalling and processing systems -Software for railway control and protection systems”, is the domain-specific adaption of IEC 61508 for development of safety-related software for railway control and protection applications. Derived from this standard is EN 50657:2017 [37] ”Railway Applications Rolling stock applications -Software on Board Rolling Stock”, which is an adaption of EN 50128 for application in the rolling stock domain of the railway industry. EN 50657 was partially created to ease work with non-safety related software after the changed definition of SIL 0 made in EN 5018:2011 compared to EN 50128:2001. The former definition of SIL 0 ”no safety impact” was changed to ”lowest level of safety impact”, rendering some confusion on how to handle products with no safety impact. EN 50657 therefore replaces SIL 0 with Basic Integrity (BI) for software that is not safety related [38]. Although EN 50128 is more commonly referred by the reviewed publications, EN 50657 was selected to be individually surveyed during the literature study, motivated by its use at the industry partner. RTCA/DO-178C ”Software Considerations in Airborne Systems and Equipment Certification”, is a set of recommendations for achieving compliance with regulations set by civil aviation authori-ties, such as the Federal Aviation Administration (FAA) and the European Aviation Safety Agency

(14)

(EASA). These guidelines are not derived from IEC 61508. The C-version was released in 2011 as the successor of DO-178B and simultaneously introduced the stand-alone document DO-330 ”Software Tool Qualification Considerations”, which provides guidance on tool qualification. DO-330 is very similar to DO-178C but adapted with objectives and requirements suitable to software tools [5]. It is emphasised that DO-178C and DO-330 were not available in full to the performed study. Therefore, these are not cited but instead referenced to research published by Rierson in ”Developing safety-critical software: a practical guide for aviation software and DO-178C compli-ance” [5].

Identified by e.g. Asplund [10] and Notander et al. [24], safety standards can be divided into two main groups based on their view on how trust in a tool shall be ensured. The first group contains IEC 61508 and standards derived thereof, where trust is established by applying generic measures including thorough specifications and assessments of the tool. Standards in this group focus on ”means” [24], and provide suggestions and methods that are recommended or required to be included during development. Of the standards mentioned, DO-178 and related DO-330 belongs a the second group where trust in a tool is ensured by the applied constraints on its development process. Standards in this group focus on ”objectives” [24] to be fulfilled, but provides limited practical guidance on how that is to be achieved. It can be derived from these conclusions that standards belonging to the first group are primarily most applicable within the scope of this thesis. Applying heavy constraints on the development process is less compatible with agile and flexible development than implementing specific methods or suggested activities only affecting parts of the development. Therefore, the survey of standards within the performed literature study primarily focuses on standards in the first group for extraction of approaches. If it in a later stage should be concluded that development of the framework should be conducted in direct accordance with a safety standard and not only inspired by such, it could be feasible with a closer evaluation of the second group of standards.

2.3.1 Safety Integrity Levels

All previously mentioned safety standards use some form of scale to classify the risk and criticality of a system or system part, where criticality is a notation of the required assurance against failure. The position of the system or system-part on the scale then determines what integrity is required to prevent failures and what activities and measures needs to be applied to achieve that. These scales are similar but still different between standards. IEC 61508 defines four Safety Integrity Levels 1 - 4, where SIL 1 is the lowest and SIL 4 the highest level of safety integrity. ISO 26262 similarly defines four Automotive Safety Integrity Levels (ASILs), A to D, where A is related to the lowest risk and D to the highest [6]. EN 50128 defines five levels from 0 to 4, where SIL 0 is the lowest and SIL 4 the highest level of safety impact [38]. EN 50657 uses the same scale as EN 50128 but replaces the lowest level SIL 0 with Basic Integrity (BI), applicable to non-safety related functions [37]. DO-178C has a five-level scale of Development Assurance Levels (DALs) from E to A, where E is the lowest classification of severity with no safety effect, and A is the highest with catastrophic severity. The scale and levels defined in D0-178C are directly mirrored in DO-330 [5]. 2.3.2 Tool Qualification

The difference between certification and qualification is defined by Asplund [10] as certification being a complete set of activities aimed to confirm that an end product possesses a set of predefined characteristics. While qualification is a subset of those activities that ensure tool confidence is at least equal to that of the process it eliminates, reduces or automates. Qualification of a tool may be required of the above standards based on an assessment if malfunctions in the tool can result in the introduction of errors, or failure to detect errors in the system the tool is used to develop. Tools are therefore classified into different categories used to determine appropriate qualification measures in the context of the determined SIL of the tool, or system/system-part the tool is used to develop. The method of classification and different categories varies between the standards.

IEC 61508 [34], EN 50128 [36] and EN 50657 [37] all makes a first division of tools into either on-line tools or off-line tools. On-line tools are defined as tools that can have a direct influence on the system during run-time and off-line as tools that support a phase of the development but

(15)

cannot have a direct influence on the system during run-time. Tools categorised as off-line are then further divided into the three classes T1, T2 and T3, as can be seen in Table 1.

Table 1: Offline tool classes according to IEC 61508, EN 50128, and EN 50657.

Class IEC 61508 / EN 50128 / EN 50657 definitions Given examples T1

Generates no outputs which can directly or indirectly contribute to the executable code (including data) of the- safety-related system [34] / software [36,37]

Text editor

Requirements or design support tool Configuration control tools

T2

Supports the test or verification of the design or executable code, where errors in the tool can fail to reveal defects but cannot directly create errors in the executable software

Test harness generator

Test coverage measurement tool Static analysis tool

T3

Generates outputs which can directly or indirectly contribute to the executable code of the safety related system

Source code compiler Optimizing compiler

ISO 26262 [18] has a different approach where tools are instead classified to a Tool Confidence Level (TCL), based on determined Tool Impact (TI) and Tool error Detection (TD). TI is the possibility that a malfunction in the tool can introduce or fail to detect errors in the developed item and has two levels, TI1 when it can be argued that no such possibility exist and TI2 for all other cases. TD is a measure of the confidence that a malfunction and related erroneous output will be prevented or detected and has three levels, TD1 when there is a high degree of confidence, TD2 for a medium degree of confidence, and TD3 for all other cases. If determination of TI or TD is not clear, estimation should be performed conservatively. Determination of TCL to either TCL1, TCL2 or TCL3 is then performed by combining TI and TD as described in Table 2. Applicable qualification measures for TCL2 and TCL3 are listed in individual tables for each category. Tools classified as TCL1 need no further qualification.

Table 2: Tool confidence level determination according to ISO 26262.

Tool error Detection Tool Impact High confidence

TD1 Medium confidence TD2 Other cases TD3 No such possibility TI1 TCL1 TCL1 TCL1 Other cases TI2 TCL1 TCL2 TCL3

With some similarity to ISO 26262, DO-330 assigns a Tool Qualification Level (TQL) to the tool, determining the required rigour during qualification. There are five TQLs and assignment is based on the classification of the tool to one of three tool qualification criteria combined with the determined software DAL. These tool criteria are a more elaborate replacement of the categorisa-tion of tools, as either related to development or verificacategorisa-tion, made in DO-178B. For each TQL, DO-330 provides eleven tables, T-0 to T-10, defining the applicable sections and objectives of the standard to the particular TQL. The most stringent level with the highest demands of a compliant process is TQL-1 and the lowest level of stringency is TQL-5. Based on the particular provided guidance the tool development then goes through a similar life cycle as the airborne software [5]. Tool qualification criteria and determination of TQL can be seen in Table 3.

(16)

Table 3: DO-330 tool criteria definitions and tool qualification levels.

Tool

Criteria Tool qualification criteria definitions

Software level

D C B A 1

A tool whose output is part of the resulting software and thus could insert an error

TQL-4 TQL-3 TQL-2 TQL-1

2

A tool that automates verification process(es) and thus could fail to detect an error, and whose output is used to justify the elimination or reduction of:

– Verification process(es) other than that automated by the tool, or – Development process(es) that could

have an impact on the airborne software

TQL-5 TQL-5 TQL-4 TQL-4

3 A tool that, within the scope of its

intended use, could fail to detect an error TQL-5 TQL-5 TQL-5 TQL-5

2.3.3 Performed Evaluations of Industrial State of Practice

In the paper ”Qualifying Software Tools According to ISO 26262,” Conrad et al. [32] contrast differences in activities required for tool qualification or certification in transportation domain standards for functional safety. The publication is also further evaluated in Section 4.2. The authors conclude DO-178 to be most stringent among the studied safety standards and emphasize the differentiation between development and verification tools, where verification tools are the less demanding to qualify than development tools. Regarding IEC 61508, it is concluded that confidence in tool output should be achieved by certifying a tool when possible. However, the standard provides limited guidance on how to actually certify a tool in practice. Thus, big differences can be seen in practical tool certification approaches, and limited requirements for tool certification in the software life-cycle may result in a lack of incentive to perform the related activities. Compared with ISO 26262 significant differences are discovered in the methodology regarding tool qualification. In the automotive implementation, detailed guidance is provided aimed at providing evidence that the tool is suitable for safety-related development.

The authors also report on perceived issues with tool qualification according to ISO 26262. It is found problematic that no distinction is made between development tools and verification tools. The authors claim that there are significant differences between the two categories that ISO 26262 omit to manage. It is also stated that the absence of certification credits for tool qualification may lead to dressed up tool classifications in order to avoid the need to apply qualification methods. Possible reuse of arguments for several tools, in order to lower the tool confidence level of the overall tool-chain, is also considered problematic, and specific arithmetic to deal with tool combinations is requested. In general, the practical feasibility of qualifying tools on an individual basis is questioned. Finally, the lack of commonly accepted approaches and established methods for best practice regarding tool qualification applicable across standards is emphasized.

In ”Tool Qualification for Safety Related Systems,” Ekman et al. [6] analyze the concept of qualifying existing tools as an alternative to the regular certification process provided by different transportation domain standards. This publication is also further evaluated in Section 4.2. The study performed by the authors mainly focus on ISO 26262 and EN 50128 but also mentions DO-178 and IEC 61508. According to the authors, tools used for development and test are commonly not developed according to the processes depicted in safety standards meant for certification. Thus, the less effort heavy alternative of tool qualification was introduced and can, according to the authors, be summarized into answering two questions: (i) ”What harm can the tool-set do to the system?” (ii) ”Is there a way to detect or even prevent that a fault in the tool-set leads to a hazard?” It is concluded that most tool qualification processes focus on the detection of potential failures by measures outside the tool itself. Potential failures with possible effect on tool output must be identified and actions to avoid such failures specified. Similar to Conrad et al. [32] the authors remark on the ISO 26262 lack of distinction between development and verification tools.

(17)

number of needed requirements for tool development can be minimized. Regarding e.g. ISO 26262, error detection implies that the Tool error Detection is improved to TD 1, subsequently lowering the Tool Confidence Level to TCL 1. Which in turn eliminates the need for tool qualification. This way of completely avoiding tool qualification by dressed up classifications is also mentioned by Conrad et al. [32] as an identified issue with ISO 26262.

In “Qualifying Software Tools, a Systems Approach,” Asplund et al. [29] propose a method for qualifying software tools as part of tool-chains based on nine identified safety goals. The method is based on the concept of tool integration, which is defined by the authors as “what supports the development process in correctly moving from one engineering tool to another in a tool-chain.” For the proposed approach, a hierarchy of organisation levels is defined, where lower levels are controlled by constraints imposed by higher levels. Giving reduced complexity at descending levels. Inspired by the reference workflow proposed by Conrad et al. [32] and combined with the concept of Safety Element out of Context (SEooC) from ISO 26262, Asplund et al. suggest four steps for guiding and limiting the qualification effort. These include pre-qualification of both tools and tool-chain by representative use-cases and requirement deduction respectively. Thus, the qualification of the tool-chain differentiates between assumptions on one hand and actual use cases on the other. Tool qualification instantiate assumptions to actual use cases and can be performed according to a relevant standard. Asplund et al. claims that their approach solves the problem with either too wide or narrow qualification efforts being dictated in standards by their view of software as entities that can be reused in different tool-chains, and that the approach is further applicable to different standards. In the later publication ”The Future of Software Tool Chain Safety Qualification”, Asplund [33] studies the relation of software faults to weaknesses in the support environments used. Focusing on the high profile safety standards used in the transportation domain as mentioned in Section 2.3. The author concludes that standards commonly concern best practices regarding separate tools addressed in isolation, which may lead to risks introduced by tool integration being ignored. This was also questioned by Conrad et al. [32] regarding the practical feasibility of qualifying individual tools. It is further identified that both too high and too low levels of automation may be a causal factor of introduced software faults. Therefore, the author proposes a change to a top-down qualification approach to be included in the next generation of safety standards that focuses on tasks instead of technology and broadens the view of risks related to tools used in development.

2.4 Software Testing

In an embedded system, the software is a major component. Making structured, suitable and sufficient testing an important part of the development. The main purposes of software testing are quality assessment and reducing the risk of software failures, by detecting and subsequently fixing defects in the test object. Other typical objectives of testing are verification of specified requirements, validation of complete and correct functionality, enabling informed decisions with confidence in the quality level, and verification of compliance with regulatory requirements or standards[39, 40]. To achieve efficient and correct testing, many different strategies, tools and frameworks have been proposed over the years [40]. Testing is an important part of the development process and the most used activity for quality assurance and quality control [1].

Besides the actual execution of predefined test cases, the process includes activities such as planning, analysis, design and implementation of tests, reporting test results, and quality assessing the tested object. When execution of the component or system is part of the testing process it is referred to as dynamic testing, contrasted by static testing that only involve reviews of work products such as implemented source code and requirements. The concept of quality assurance focuses on compliance with suitable processes to provide confidence in the achieved level of quality, and should not be confused with testing which is one of several inherent activities. Testing is a mean to achieve quality in different ways, while quality assurance deals with the entire process and is the enabler of correct testing [39].

The definition, the purpose, and the scope of software testing varies between sources. Accord-ing to the International Software TestAccord-ing Qualifications Board (ISTQB) [41], testing is defined as ”the process consisting of all lifecycle activities, both static and dynamic, concerned with plan-ning, preparation and evaluation of a component or system and related work products to determine

(18)

that they satisfy specified requirements, to demonstrate that they are fit for purpose and to detect defects”. In his licentiate thesis ”Automated system level software testing of networked embed-ded systems”, Strandberg [7] provides a definition that partially overlaps with ISTQB [41] and ISO/IEC/IEEE 29119-1 [42] as ”the act of manually or automatically inspecting or executing soft-ware with or without custom hardsoft-ware in order to gather information for some purpose: feedback, quality control, finding issues, building trust, or other.” In EN 50657 [37] software testing in relation to its objective is defined as ”the objective of software testing, as performed by the tester and/or integrator, is to ascertain the behaviour or performance of software against the corresponding test specification to the extent achievable by the selected test coverage”.

2.5 Test Automation

By migrating test execution from human to software, available resources can be utilized more efficiently, repeatability increased, costs decreased, and development efficiency improved. Test automation is, therefore, an important factor in agile development enabling fast feedback to de-velopers and stakeholders, and allowing tests to be performed by a diverse pool of employees [1]. Common concepts in agile development such as ’continuous integration [43]’ and ’automated ac-ceptance testing [44]’ heavily rely on test automation [1]. For the implementation of test cases, monitoring and control of execution, and reporting and logging of results, it is necessary for test automation to involve the design of testware. This should include software, documentation, test cases, test environments and test data. The concept of test automation includes using purpose-built tools for control and setup, test execution, and evaluating differences between required and actual results [3].

2.5.1 Example Architecture

Based on the generic test automation architecture provided by [3] combined with an abstracted view of the implemented architecture at the industry partner [7], an example of a test automation architecture can be seen in Figure 3. This illustrates a Test Automation Framework (TAF), which can be seen as a set of different tools with specific tasks that interact with each other. In the image, DUT is the abbreviation of Device Under Test.

(19)

3 Methodology

The research was intended to be performed as an academic case study validated against industrial needs. The aim was to yield new knowledge with value for both academia and the industrial part-ner, supplemented with a practically implementable outcome. Therefore, the research methodology was inspired by the ”Research Approach and Technology Transfer model” proposed by Gorschek et al. in ”A Model for Technology Transfer in Practice” [13], augmented with an extended literature study. The original model propose seven sequential steps, defined as listed below.

1. Problem/issue

2. Study state of the art, and problem formulation 3. Candidate solution

4. Validation in academia 5. Static Validation 6. Dynamic Validation 7. Release Solution

Considered included in the scope of the thesis were steps 1 to 5, and the remaining steps 6 and 7 were therefore left for future work. Dynamic validation, step 6, includes live tests of actual implementations into the industrial context, not considered as feasible to perform in the limited scope of a thesis. This entailed for natural reasons that releasing a solution, step 7, was also considered as not feasible. The remaining five steps were adopted to suit the context of the planned research by transforming them to a new process containing six steps. The proposed model was thereby treated as a template and instanced into a suitable process, as further described in Section 3.1 and visualized by Figure 4. This process maps fairly well to proposed parts of the original model, but some parts, e.g. step 4, were not obviously mappable. The adaption of included steps in the process was based on interpretation into the context and scope of the work to be performed. Most steps in the process have individual associated methods, which are further elaborated in the context of the individual step. The last step is the overall goal of the proposed model. In the context of the adapted process it serves as a manifestation of the actual technology transfer as proposed by Gorschek et al. in [13].

(20)

3.1 Process

The adapted process is below described based on individual steps, with the respective methods used. Regarding applicable steps, the methods used are further elaborated in the dedicated section addressing the specific step. It should be noted that the process was not entirely linear since steps 3, 4 and 5 to some extent were iteratively executed. The iterative approach refined the results of the literature study in step 2, and the derived candidate solutions in step 3, into the context they were aimed fit into. The process visualized in Figure 4 is conceptual. The actual last step before final technology transfer was step 4.

Step 1 - Problem Formulation: The first step was performed in both academia and industry. This involved identification of an area with potential for improvement, and formulation of the overall problem based on industry needs. Conducted in collaboration with the industry partner, the outcome of this step was the initial problem formulation and research ques-tions. These were further refined during the process of the literature study in step 2 and are presented in Section 1.1.

Step 2 - Literature Study: Conducted in academia, the second part of the research was per-formed as a literature study, including state of the art and state of practice. This constituted a big part of the work performed during the thesis and was aimed at identifying how qual-ification and quality assurance of software development tools are performed with regards to functional safety development as well as applied methodologies in agile or mixed development philosophies. Based on the findings presented in the related work section, another aim was to evaluate if the definition of a test tool/framework has any implications regarding available methods for assuring its quality. The purpose of this step was to build a basis for address-ing question RQ1 presented in Section 1.1.1. The specific methods used for performaddress-ing the literature study are further elaborated in Section 4.1.

Step 3 - Candidate Solutions: In this step, performed in academia, all knowledge gained from the literature study was compiled to a set of candidate solutions with potential for assuring the quality of a software development tool. The derived candidates include e.g. new software implementations, added features for increased reliability, error mitigation strategies, and proposed changes to the development process. The result as presented in Section 5.2, and later refined in upcoming sections, is one of the main outcomes and contributions of the thesis. The purpose of this step was to further address and refine previous results concerning, RQ1. The specific method used is further elaborated in Section 5.1.

Step 4 - Candidate Validation and Refinement: The fourth step was performed in academia and industry, but slightly shifted towards academia. This involved considerations of the cur-rent methodology applied at the industry partner concerning the specific candidates. The current state of the candidates was refined both to suit the industrial context of an intended application, and to increase general applicability. The candidates were further processed in collaboration with, and based on input from, the industrial supervisor. This enabled utiliza-tion of provided domain-specific knowledge to validate and increase the industrial feasibility of the candidates. This step was aimed to address RQ2 and has some overlap with both the preceding and subsequent steps. Therefore this execution has no dedicated section, but can be considered exemplified in Sections 6.2 and 6.3.7.

Step 5 - Focus Group: Performed in academia and industry, but shifted towards industry, the fifth step aimed to further address RQ2 and validate and refine the previous results. The step was executed as a focus group, conducted in collaboration with the industry partner. This activity constituted a big part of the work performed during the thesis. The focus group aimed to derive case-specific implications from the proposed candidate solutions and to gain feedback on the industry’s perception of them. Further, the focus group was also aimed at deciding on the feasibility of the different candidates and to prioritize between them. Prioritization of the candidates aimed to provide insights, but also to allow for scalability of the continued work to be performed during the project. The result as presented in Section 6.3 is one of the main outcomes and contributions of the thesis. The specific methods used for preparing and conducting the focus group are further described in Section 6.1.

(21)

Step 6 - Technology Transfer: The final activity was mainly performed in industry, while also generating generalizable academic value. This was dedicated to a proof of concept implemen-tation utilizing previously gained results, aimed at suggesting a possible practical application according to RQ2. As indicated by Figure 4, this step can be seen the collected result of previously performed steps by instantiation and realization based on their results. The result as presented in Sections 7.2 and 7.3 is one of the main outcomes and contributions of the thesis. This contains both general and case-specific proposed guidelines, derived from the outcomes and contributions of previously conducted steps. These proposed guidelines were aimed to provide insights and to be used as a stimulating basis for further discussions on increased confidence and quality of software development tools. The specific method used is further elaborated in Section 7.1.

3.2 Presentation of the Studied Case

The case study was performed as an industry-academia collaboration. The industry partner, West-ermo Network Technologies AB3 (Westermo), specializes in industrial communication equipment for domains with high demands on robustness and availability, such as train, oil and gas, maritime, and water treatment. Thus, many customers have to comply with a functional-safety standard, which imposes demands of high quality on products acquired.

Different devices for robust data communication are developed, e.g. robust Ethernet switches. Each device is an embedded system, running the Westermo Operating System (WeOS), a software solution also developed by the industry partner. While based on the Linux kernel4_{, WeOS also}

includes both proprietary code and open-source software libraries. This accumulates to a source code base of millions of lines of code.

To ensure the quality of the products, Westermo applies a quality assurance method based on automated testing, conducted on several test systems each night. Requirement-based functional testing is the foundation of the WeOS software test strategy, where each developer writes test cases for the implemented functionality. This is followed by regression testing based on previously developed test cases to ensure new features has not affected existent functionality. Further, there is risk-based testing, where identified risks are used to construct new test cases, and release testing using third-party tools in combination with reviews and formal verification.

A test framework has been developed, implemented and maintained over several years. The framework consists of testware and different setups of devices into several physical test systems with varying layouts, each containing 4 to 25 devices with hardware, firmware and software. The in-house developed testware is used to configure and control the devices, which are running some version of WeOS. Further, the testware contains all test scripts, configurations and procedures, and is also used for activities surrounding the tests such as e.g. test case selection, setup, tear-down, and logging. The framework allows for both manual and automated testing, simulating installation scenarios and hardware/software combinations to test e.g. a software feature, a physical device, or a customer-specific case [2,45, 46].

Development of both WeOS and the test framework is performed according to the Kanban methodology, with a philosophy to always maintain a stable current version of the software. New features are developed and tested in isolated branches before being merged with the stable master branch. After implementation and merge of new functionality to the current stable version, a release is created that then becomes the stable master for the next iteration [47]. Decisions on when to merge a development branch with the stable master or when to release a new version are highly dependent on the produced test results. The capability to make correct decisions is therefore highly dependent on the quality of, and confidence in, the test framework.

The studied case in the research is defined as the industrial partner and the products developed. The unit of analysis is defined as the development and maintenance of the test framework, utilized at the industrial partner for the execution of manual and automated tests of produced products.

3_{https://www.westermo.com} 4_{https://www.kernel.org}

(22)

4 Literature Study

This section describes the execution of the literature study, presented as step 2 in Section 3.1. First the method used for conducting the literature study according to a constructed algorithm for snowballing and applied inclusion criteria is presented. The results are presented as divided into three categories based on the criteria for inclusion they can be derived to and summarized to tables within the individual categories. Identified threats and limitations are summarized at the end.

4.1 Literature Study Method

The method used for conducting the literature study in step 2 was based on the paper ”Guidelines for Snowballing in Systematic Literature Studies and a Replication in Software Engineering” by Wohlin [14]. The author proposes an approach that complements previous guidelines provided by Kitchenham et al. in e.g. [48] for conducting systematic literature reviews by detailing the concept of snowballing as a research approach. The approach uses the collective term systematic literature study as a combination of systematic literature review and systematic mapping study and is based on six steps. Based on these steps an algorithm for the snowballing literature study was created, as presented in section 4.1.2. An initial set of papers was identified using Google Scholar with search terms relevant for the research project and also based on recommended papers provided by the collaborating company supervisor. According to Wohlin, using Google Scholar reduces the risk of search results being biased by a specific publisher or database. The initial set of papers is preliminary and will only be included in the actual snowballing set after a review of the paper has determined its relevance. The review was to some extent performed in combination with data extraction, as suggested by Wohlin. Determining if a publication should be included was based on two phases of inclusion criteria, as detailed in Section 4.1.1. For forward snowballing the ’cited by’ function in Google Scholar was utilized and for backward snowballing the list of references provided in the paper was reviewed. The snowballing procedure is further explained in Section 4.1.2.

Further included in the literature study was the review of three high profile standards used in the transportation domain, specifically the sections/clauses regarding software development tools. The relevance of these selected standards is considered to be motivated in Section 2.3. Therefore, no inclusion criteria as described in Section 4.1.1 were applied regarding the reviewed standards. 4.1.1 Inclusion criteria

In the first review iteration to determine initial inclusion, the abstract was read in search of the content listed below. For inclusion at this stage, only one of the listed criteria had to be fulfilled.

I1. Discusses tool qualification in relation to a relevant safety standard.

I2. Propose a method or approach related to tool quality or confidence in tool use. I3. Defines or reports on challenges related to test automation, tools, or frameworks.

I4. Discusses challenges or obstacles in combining safety-critical plan-driven development with agile processes

After initial inclusion had been concluded the publications were more thoroughly examined to determine inclusion in the final set. This determination was based on two criteria, C1 and C2, leading to either excluding the publication or sorting it into one of two categories derived from the criteria below.

C1. An approach or possible candidate can be directly extracted

C2. Contains information that could potentially be used to create an approach or candidate 4.1.2 Snowballing Algorithm

The snowballing procedure involves the concepts of backward snowballing and forward snowballing. Further, it also involves identification of an initial set of publications and determination on either inclusion or exclusion criteria as previously mentioned in Section 4.1. For backward snowballing,

(23)

the reference list of an already included publication is studied to identify additional publications to be included. For forward snowballing, research is made to identify citations in later publica-tions back to the included publication. Citing publicapublica-tions are then evaluated for inclusion or exclusion [14]. Based on this, a snowballing algorithm was constructed as described below.

1. Establish set of studies S and 2. Initialize set of relevant findings F 3. For each study s in S:

– Extract relevant finding f to F – For each relevant study i citing s:

– Add i to S

– For each relevant study j cited by s: – Add j to S

4. While S is growing:

– Repeat step 3 for all unprocessed studies 5. Additional exit criteria:

– Theoretical saturation, where new studies does not contribute with new information – The time plan does not allow any new processing before conducting the focus group 4.1.3 Included publications

The initial set of publications consisted of 10 articles [4,10,24,28,32,40,49,50,51,52]. After the snowballing process and applied inclusion criteria, a total of 9 publications were included in the final extraction of data, of which 3 were part of the initial set. It should be noted that additional Google Scholar searches were conducted in parallel with the snowballing process aimed at finding new publications to be used as starting points, a method supported by Wohlin [14] to find missed clusters of publications.

A total count of 32 publications, including those in the initial set, were reviewed as a result of the snowballing process and additionally performed searches. Tables 4 and 5 lists the finally included publications. Further, three standards for functional safety were also reviewed in regards to the sections covering aspects of confidence in the use of software development tools. These are listed in Table 6.

4.2 Literature Study Results

All included publications were sorted during the inclusion process based on the perception if an appropriate approach could be directly extracted from the content or if such an approach could be derived from information provided in the publication, as specified in section 4.1.1. For clarity, the results are presented in Sections 4.2.1 and 4.2.2, differentiated between the criteria fulfillment that lead to inclusion. Results from reviewed standards are also presented separately in Section 4.2.3 since they were excluded from the criteria based inclusion process. The first two categories are based on studying other authors interpretation of the standards, and the last category is based on reviews of the actual standards themselves. It should be noted that one publication was included in both criteria based categories.

4.2.1 Included According to Criteria C1

Conrad et al. [32] investigate some of the main standards within the transportation domain in an attempt to qualify two already existing tools in accordance with ISO 26262. The incentive was to develop a practical tool qualification approach and the tools in question were a code generation tool and a C/C++ code verifying tool for which the authors report on their experiences from tool qualification. The approach that was deemed to be directly extracted is to utilize a reference

Test Framework Quality Assurance: Augmenting Agile Processes with Safety Standards

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science in Engineering

-Dependable Systems 30.0 credits

TEST FRAMEWORK

QUALITY ASSURANCE:

AUGMENTING AGILE PROCESSES

WITH SAFETY STANDARDS

Jonathan Th¨

orn

jtn14004@student.mdh.se

Examiner: Wasif Afzal

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisor: Daniel Sundmark

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Per Erik Strandberg,

Westermo Network Technologies AB, V¨

aster˚

as

May 19, 2020

Acknowledgements

Acronyms

Table of Contents

List of Figures

List of Tables

1

Introduction

1.1

Problem Formulation

2

Background and Related Work

2.1

Development Philosophies

2.2

Software Development Tools

2.3

Industrial State of Practice

2.4

Software Testing

2.5

Test Automation

3

Methodology

3.1

Process

3.2

Presentation of the Studied Case

4

Literature Study

4.1

Literature Study Method

4.2

Literature Study Results