Automated software testing for cross-platform systems

(1)

Automated software testing for cross-platform systems

Gustaf Br¨ annstr¨ om

January 29, 2012

Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Mikael R¨ annar

Examiner: Fredrik Georgsson

Ume˚ a University

Department of Computing Science

SE-901 87 UME˚ A

(2)

(3)

Abstract

SILK is the preferred audio codec to use in a call between Skype clients. Every time the source code has been changed there is a risk the code is no longer bit-exact between all the different platforms. The main task for this thesis is to make it possible to test bit-exactness between platforms automatically to save resources for the company.

During this thesis a literature study about software testing has been carried out to find a good way of testing bit-exactness between different platforms.

The advantages and disadvantages with the different testing techniques was examined during this study.

The result of the thesis is a framework for testing bit-exactness between several different platforms. Based on the conclusions from the literature study the framework is using a technique called data-driven testing to carry out the

(4)

ii

(5)

List of Figures

3.1 The cost for fixing a bug depending on the time. . . 5 3.2 The relation between the cost of performing the tests and the

number of defects. . . 6 3.3 The figure shows what is included in white box testing. . . 8 3.4 The figure shows how data-driven testing could work. . . 12 5.1 A system overview. The output from the recordings is used as

input to the functions the replay module is testing. . . 17 5.2 An example how the network of the computers participating in

the test could look like. . . 20

(8)

vi LIST OF FIGURES

(9)

List of Tables

3.1 Example of how the data could look like in a simple example for data-driven testing. . . 12 3.2 Example of how the data could look like in a simple example for

keyword-driven testing. . . 13 4.1 Preliminary schedule for the thesis work. . . 15

(10)

viii LIST OF TABLES

(11)

Chapter 1

Introduction

When a company is developing new software, problems and bugs will occur in the source code. As the number of lines of code increases it becomes more difficult to locate where in the source code the bugs are. Since it is not possible to avoid bugs in the code it will become a common but also a very time consuming task to perform and thus it will end up being a large company expense. If it was possible to automatically locate all the incorrect functions in the source code it would be highly desirable since this could help a company to save a lot of resources.

This thesis focuses on automated testing for cross-platform software. Dur- ing the thesis a framework will be implemented to solve the task of testing bit-exactness between functions in Skype’s open source audio codec SILK.

1.1 Skype

Skype is a Luxembourg based company and was founded 2003 by Niklas Zenn- str¨om and Janus Friis [15]. Skype is developing software for both video and voice communication. Their software can be used on regular desktop computer, phones and TV’s. During 2010 the users of Skype was talking 207 billion minutes in total and in the last quarter of the same year they had an average of 180 million users connected every month [15].

1.2 SILK

SILK is the default audio codec between two devices running Skype [17] and is open source code. The codec can be used in real-time applications and supports several different sampling frequencies and can adjust to both network and CPU changes [16]. SILK is implemented in C, but some of the functions are optimised with assembly code and the codec supports several different platforms.

(12)

2 Chapter 1. Introduction

1.3 Definitions

The definitions below will be used throughout the entire thesis:

• Bug

A bug is the same thing as a software defect, error or fault. The word bug has been used a long time for defects in products, not only in computer software.

• Bit-exact

Two binary units are considered bit-exact when the bits they are rep- resented by are exactly the same. For instance, the output from two programs are bit-exact if the output are identical.

• Codec

A codec consist of two parts, an encoder and a decoder. The encoder encodes data which then can be decoded by the decoder. A codec can be used for several reason, e.g. an audio codec can be used to compress and decompress audio data.

1.4 Outline

A brief description of the following chapters in the thesis:

• Chapter 2, Problem description, a description of the problem, the goals and the purpose with the thesis

• Chapter 3, Software testing, an in-depth study of existing software testing methods

• Chapter 4, Accomplishments, a description of the work process

• Chapter 5, Results, a system overview of the test framework

• Chapter 6, Conclusions, the reached conclusions during the thesis

• Chapter 7, Acknowledgements

(13)

Chapter 2

Problem description

In the following sections the problem of this thesis is stated and the goal and the purpose of the thesis is described.

2.1 Problem statement

When the developers at Skype have modified the existing source code or have written new functions they want to make sure SILK is still bit-exact. It is important SILK remains bit-exact between all the different platforms it will be executed on. If not, the resulting output signal from SILK will differ between the platforms. The following steps describe one way to test if the codec is bit-exact:

1. Prepare all the devices 2. Build the codec for all devices 3. Copy the binaries to the devices

4. Run the test for the component on all devices 5. Analyse the results

SILK is supported by several platforms which makes it a time consuming task to manually test it on each platform. Since the electronic market is growing fast, the number of platforms the codec will have to support is increasing. For each new platform the test procedure will become more and more time consuming to perform.

If the codec would not be bit-exact the tester needs to report the problem to the developers. The developers then needs to do the tedious work of finding the function causing the codec to not be bit-exact.

(14)

4 Chapter 2. Problem description

For each step, including the debugging step done by the developers, that can be automated the time complexity for the testing procedure will be reduced.

In the best case all of the steps above are automated, which would give the company a certain level of quality assurance.

2.2 Goal

The goal of this thesis is to implement a test framework to compare if the output from different functions executed on different platforms is bit-exact.

The framework should automate the steps in the test procedure stated in 2.1.

2.3 Purpose

The main purpose of this thesis was to find different types of existing testing methods and decide which of these could be used in an automated test framework for cross-platform systems. This framework is supposed to help the audio developers at Skype to verify that all the functions in SILK continues to be bit-exact between platforms after new code has been committed. The developers at Skype can currently only test if the entire encoder and decoder are bit-exact between platforms, not a specific function.

(15)

Chapter 3

Software testing

This chapter is the result of the literature study about software testing that was carried out during the thesis.

3.1 Introduction

In todays society people get in contact with computers almost every day and all the computers contain some sort of software. As the software becomes larger and more complex the probability of bugs to occur will increase. As the number of lines of code grows it will also become more difficult to find a bug. Accord- ing to [12] it is also very important to find bugs as early as possible during the development as it will only become more expensive to fix them later on in the project. This is illustrated in figure 3.1 from [12, p. 9].

Figure 3.1: The cost for fixing a bug depending on the time.

(16)

6 Chapter 3. Software testing

The companies should therefore try to test the software as soon as possible during the development to lower the cost. According to the author of [3] these are some of the consequences a company with a low quality product can expect:

• protracted delays in delivering new applications,

• lose customers to competitive companies,

• high maintenance costs due to poor quality and

• high customer support costs due to poor quality.

If the company does not test the software thoroughly and extensively enough they will probably miss several of the bugs since the software is under tested.

On the other hand, the software can be over tested if a company put to much resources on testing it and the expense will be unnecessarily high. These two scenarios are shown in figure 3.2 from [3, p. 48].

Figure 3.2: The relation between the cost of performing the tests and the number of defects.

3.2 Unit testing

The idea with unit testing according to [7] is that each unit test should verify that a specific part of the source code has a particular behaviour. Each unit

(17)

3.3. White box testing 7

test should clearly state if the output from a test passed or failed given a certain input and then quickly give the feedback to the developer or tester. Today it is common to use unit tests and it is a key element in test driven development [7].

Companies that are applying this development method should implement the functions after the tests have been written. The idea is that the output of each function should be predictable given a certain input and therefore it should be possible to write the tests before implementing the functions. There exists many different unit test frameworks and here is a list of some of the members in the xUnit family:

• SUnit (Smalltalk)

• JUnit (Java)

• CppUnit (C++)

• PyUnit (Python)

Even though unit tests are commonly used the concept has several drawbacks.

It is a time consuming task to write all the tests and run all of them manually.

If the project is big and contains a lot of branches it will not be feasible to write tests to cover all the branches. Another drawback with unit tests according to the author of [8] is the difficulties of writing good tests. If the tests are badly written they will either cover the function poorly or unnecessarily many tests needs to be written. It should be straight forward to write unit tests for functions with simple input and output. But as the input or output of the function under test is getting more complex it becomes harder for the tester to write the test cases and there is a risk that the function will be poorly tested.

In [1] the authors are describing the psychology behind testing. If the developer is writing the test, he or she may become blind of the their own errors.

But if someone else is writing the tests instead, this will probably become even more time consuming due to poor knowledge of the code.

If a lot of test cases has been written for a certain software application and the application needs to be redesigned, all the original test cases might also need to be rewritten due to this change. This is not only because interfaces might change but also behaviours of functions also might change which makes the original test cases to be invalid. It is therefore important that the code that will be tested is stable.

3.3 White box testing

The author of [14] describes white box testing as ”a way of testing the external functionality of the code by examining and testing the program code that re-

(18)

code will be required since it will be analysed in different ways. Software companies can use this type of testing to get an overview of which parts of the source code are being used and which parts that should be improved.

White box testing can be divided into two parts, static testing and structural testing, which is shown in figure 3.3 from [14, p. 48]. The analysis of the code

Figure 3.3: The figure shows what is included in white box testing.

during the static testing can either be manually or automatically according to [14]. During the static testing it is possible to see e.g. if the code fulfils the functional requirements, if any part of the source code is unreachable and if all the declared variables are being used. The source code will never be compiled during the static testing.

The structural testing includes, as shown in figure 3.3, unit testing (described in 3.2), code coverage testing and code complexity testing. The code coverage testing will run the program with predefined test cases and profile the software.

It exists four different ways of measuring the code coverage according to [14]:

• function coverage, how many times each function has been called,

• statement coverage, how many of the statements has been executed,

• path coverage, how many of the different paths has been executed and

• condition coverage, how many of the conditions has been evaluated.

The returned information from the code coverage tool will help the developers to see e.g. if there exists any code that has not been executed during the tests.

The developers can then decide if such code should be removed or if they should

(19)

3.4. Black box testing 9

add more test cases to cover that code.

To measure the code complexity one can use cyclomatic complexity [10]. It

”measures program unit complexity in term of control flows, specifically branch- ing” according to [2, p. 284]. To compute the cyclomatic complexity the control flow graph is required. The complexity for a single function is calculated as:

M = E − N + 2 where

• M =complexity

• E =number of edges in the control flow graph

• N =number of nodes in the control flow graph

The complexity M will represent the number of independent paths through the function, and can be helpful when writing unit tests.

3.4 Black box testing

In [14] the authors explains black box testing as testing ”without knowledge of the internals of the system under test” (p. 74). Black box testing can be used to test the software against a list of specifications or requirements. This type of testing should be used on software that is ready for delivery. The tester should be able to look at the software specifications, not the source code, and then give the software some input. The output from the software should then be verified against expected output according to the software specifications.

Black box testing should not be used for finding errors in the software, only for verifying it. This is why this type of testing can be useful for a company that has paid another company to develop the software application.

When performing black box testing the tester should use both positive and negative testing. Positive testing can be used to ensure the customer that the software is working as excepted and negative testing can be used to show that the software does not crash due to unexpected input.

3.5 Regression testing

When new source code has been developed or old source code has been modified, both new bugs can occur and old bugs might reoccur. To detect this types of bugs a quality engineer can use regression testing. This is described by [14]

as ”regression testing is done to ensure that enhancements or defect fixes made to the software works properly and does not affect the existing functionality”

(20)

the software as soon as something, e.g. the hardware the software is executed on, has changed. This is important since some part of the source code might be hardware independent and therefore some bugs might only occur on certain hardware.

Usually it is not feasible to run all the test cases for the software every time new source code is committed. So, the first step in the regression testing is usually to perform a smoke test. A smoke test is essentially a test for all the basic functionality of the software. If this test fails, the bug that is causing the crash needs to be fixed before a more detailed testing can be done. If the tester does not have the time to run all the tests, the tester should then instead select and run a subset of tests that is covering the new source code.

3.6 Automated software testing

The testing techniques in the previous sections in this chapter are traditionally carried out manually during the scripted testing done by a quality engineer [4]. Another way of doing the testing is to automate the testing procedure.

The execution of the test cases can be automated by using a specialised test framework or software. The authors of [14] describes a test framework as a

”module that combines ’what to execute’ and ’how they have to be executed’”

(p. 398).

According to [5] these are some of the benefits of automating the test procedure:

• Saves time and resources

An automated test framework is most likely more efficient than a quality engineer who is running the same tests manually.

• More reliable testing

Test cases which includes many steps can be hard for a manual tester to carry out without doing anything wrong.

• Run more tests

A company will get better test coverage of the software since an automated framework is more likely to be more efficient than manual testing.

The quality engineer can write new test cases or improve existing tests cases instead of running the tests manually.

• Run other types of tests

With an automated test framework it is possible to run tests e.g. stress tests and long time tests. Tests like these can be hard do manually.

However, there also exists drawbacks with automated software testing. Here is a list of some of the drawbacks:

(21)

3.6. Automated software testing 11

• The return of investment

It is not always worth setting up an automated test framework for the testing according to [5]. If the return of investment is not big enough the company should continue with the manual testing. The company should keep in mind that it may take a while before they will get return of the investment.

• All tests can not be automated

Tests such as different types of ad hoc testing [14] and verification of graphical user interfaces are hard to automate.

• Test that requires human interaction

Some tests requires human interaction, such as connecting and discon- necting different hardware, thus not suitable for automation according to [14].

• Often changing software

If the software under test is changing often it will be a large overhead to set up the automated test framework for the software every time it has been changed according to [6].

As described in [5] the need of manual testers does not disappear when the tests becomes automated. The manual testers are experts on how to test the software. Their knowledge could instead be used to improve the test cases the automated test framework is executing.

The following subsection describes different techniques that can be used in an automated test framework.

3.6.1 Recorded testing

This method is recording all the interactions with the software according to the author of [11]. The interaction could e.g. be mouse clicks and keyboard actions. The recorded interactions can then be used later on to test if the new version of the software is doing the same thing if the recorded interactions are replayed to it.

This method can e.g. be used to test graphical user interfaces, but it is important to do the recordings in the correct way. Depending on how the recordings with the graphical user interface are done, the playback step could be sensitive to changes in the user interface. Thus, it will require different recordings for each version of the graphical user interface.

3.6.2 Data-driven testing

According to the authors of [6] data-driven testing consists of two parts, the

(22)

runs, it first reads data, which could be stored in a database or in files, and then runs the test and compare the results against a database or a file containing the expected results. This is illustrated in figure 3.4.

Figure 3.4: The figure shows how data-driven testing could work.

The data for the testing needs to be recorded or generated in one way or another before the testing can begin. If data-driven testing is done with a large database with test data the test coverage of the software will be good.

Below, in table 3.1, is an example how the data could look like when a calcu- lators addition function is tested:

Input Result

1 1 2

1 2 3

2 3 5

3 5 8

Table 3.1: Example of how the data could look like in a simple example for data-driven testing.

In this example the script would read the element in the first row in the two first columns and then run the function under test and compare the result from the function with the expected result in the third column. This procedure is repeated for each row in the table.

3.6.3 Keyword-driven testing

Keyword-driven testing is similar to data-driven testing. These two methods are sometimes referred to as table-driven testing. Both methods uses databases

(23)

3.7. Conclusions 13

or files that are containing the input and the result. The difference is that the table for the keyword-driven testing also contains a keyword and table 3.2 is an example of how this could look like if the system under test is a calculator.

Each keyword corresponds to a predefined action and when the test script reads a keyword it knows what to do.

Keyword Input Result

add 1 1 2

sub 2 1 1

mul 3 2 6

div 6 2 3

Table 3.2: Example of how the data could look like in a simple example for keyword-driven testing.

Each row in the table is representing one test case. The script will first read the keyword on the current line and translate the keyword to a specific set of actions. The next two columns will be used as input to the actions and the result from the actions will be compared against the result in the fourth column.

In this example the script will translate add to call the calculator’s function for addition with 1 and 1 as input. The result from the function will be compared against the expected result value from the table, which is 2.

3.7 Conclusions

All different types of testing are not suited to be automated. Stress, reliability, regression and functional testing are four types of testing that are well suited for automation since they are repetitive tasks. As stated earlier, ad hoc testing and different types of static testing are on the other hand not suited for automation mainly because they require human interaction.

But if a company are thinking about setting up a test framework for their application they should consider how expensive it would be to fix a bug compared to how much resources they would have to use to avoid such bugs. If the application is advanced and complex it would probably require more resources than a very simple application. For a simple application it would probably be enough to do exploratory testing and unit testing, which is less expensive than implementing or buying an automated testing framework.

If it is decided to set up a test framework for the application, it should be

(24)

code should automatically be tested on each code commit, since it is easier and cheaper to fix the bugs in a early phase than in a late phase. The results of the tests should be easily accessible and visualised to the developers that are fixing the bugs. If they do not know that the bugs exists, they can not fix the bugs.

It is also important to remember to do integration testing, since it is very likely that new bugs occurs when the sub components are interacting with each other. Even if a sub component is passing all the unit and regression tests it is possible it fails when it is integrated with the other sub components. For instance, this can happen in an application that is running each sub component in different threads with some shared resources. The threading issues this application could have are not possible to test with unit tests. This should instead be tested in a automated test framework which could stress test the application.

All three different automated methods to test software have similarities. They all require predefined input to the software it will test. Data-driven testing is suitable to use when testing the lower level of the software, e.g. a specific function or an entire component. Keyword-driven testing is adding another layer to the data-driven testing since each keyword could correspond to a set of components and functions that will be tested. If the keywords in the database are testing things in the higher level of the software this could in fact test similar things as recorded testing. But recorded testing can not be used to test the lower level of the application since it records the input to the application on the higher level.

To test bit-exactness between functions the same input should be used to all the functions that will be compared. This data could either be automatically generated or it could be recorded from an execution. Every time the functions under test are called the input to the function is recorded. If the latter type is used, the input data to the functions under test will be input it actually might get in a real situation and not some automatically generated dummy data. If the functions under test are called with varying input data during the recording, the test coverage will be good when the data is used during the bit-exactness test. For an audio codec it is possible to use a long and varying audio file as input during the execution to generate recordings with good code coverage.

With a large set of varying input it is possible to use data-driven testing to test a system. With data-driven testing it is possible to run regression testing and different types of functional testing which is exactly what should be done when SILK is tested for bit-exactness. These are the reasons why the framework that will be implemented during this thesis will be using data-driven testing.

(25)

Chapter 4

Accomplishment

This section will describe the preliminaries for the thesis, how the work was planned, and how the work actually was done.

4.1 Preliminaries

Below, table 4.1 shows an outline of the preliminary schedule that was written for the project plan. The table does not include the preparation work, which includes writing a proposal and a project plan for the thesis, that was done before the thesis started. All the work during the thesis was planned to be done at Skype’s office in Stockholm. Even though it is not visible in 4.1, the plan was to start write the report when the implementation reached the final stage.

Weeks Work tasks 4 Literature study,

study current test system and designing the test framework 9 Implementation and testing 6 Write report and

evaluate the framework

1 Prepare presentation and opposition Table 4.1: Preliminary schedule for the thesis work.

4.2 How the work was done

This section will describe how the separate parts of the thesis was done.

(26)

16 Chapter 4. Accomplishment

Literature study, study of current test systems and designing the framework

I was supposed to do the literature study, study of the current test systems and designing the test framework during the first four weeks of the thesis. Due to work related travelling and a conference the literature study was delayed a little. The consequences were that I had to do the literature study in parallel with some parts of the implementation instead. Luckily this did not cause any big problems. Thanks to a deal the company where I did my thesis had with a company that provides e-books it was easy to find a lot of literature about software testing. The outcome of this part of the thesis was a design of the test framework and knowledge on how they are doing the testing.

Implementation and testing

The next nine weeks of work I was supposed to spend on implementing the test framework. During these weeks I first ran in to some problems with pointers in C and then some problem with setting up Cygwin to work with the framework.

The framework had to work with Cygwin since this was a requirement from the company. Due to the problems with Cygwin the implementation phase took me one week extra and thus ten weeks in total. During this phase of the thesis the framework was also tested and the outcome of this phase was the test framework.

Write the report and evaluate the test framework

After the implementation was done I started to write the report. I did not do this simultaneously with the implementation as it was planned but thanks to different kind of documentation and notes this was not a problem to do.

(27)

Chapter 5

Results

The outcome of the in-depth-study and the implementation was a framework.

It consists of two separate modules, the record and the replay module. The record module makes it possible to capture input data to a function. The data that was captured by the record module is later on to be used as input to the functions the replay module wants to test. The result from the execution of the function under test will be captured by the replay module and be compared against the result from other functions. This is shown in figure 5.1.

Figure 5.1: A system overview. The output from the recordings is used as input to the functions the replay module is testing.

The framework can compare if the result from two or more functions is the same. This makes it possible to test if a function that has been implemented differently, due to e.g. optimisations, for different platforms is bit-exact between the platforms. By giving the same input to all the different implementations of a function, it is possible to make sure the function is bit-exact between the different implementations by comparing the result from the function.

(28)

18 Chapter 5. Results

5.1 Test specifications

XML files are used to specify which functions to test and how to build and run the executables. The framework is using three different types of XML files.

The first file contains information about the functions and their parameters and the second file contains the information that is needed to build and execute the files. These two files are used by both the record and replay module. The last file is containing the test cases, which functions the test framework will compare and on which platform the functions should be executed on. This file is only used by the replay module.

5.2 Record

The recording module is capturing input to a function. The idea with this module is to generate input data to the functions the replay module will test.

It is important that all the different implementations of the function that the replay module will test are using the same recording as input. These are the main steps in the recording module:

1. Rename functions 2. Create new functions

3. Build and run program with input

During the first step all the functions that have been specified in the XML file will be renamed. A prefix (in this case ” ”) will be added to the original function name for each function. All the occurrences of the function name in the file will be changed, not only the function head. The framework will also search if the directory where the original C-code is located contains a file with optimised code. It will search for a file with the same file name but ends with

” arm.S” instead of ”.c”. If such a file exists all the occurrences of the function name will be renamed in the same way as the C-code.

The second step for the recording module is to create new functions that are replacing all the original functions since they have been renamed. All the new functions will write to two files for each parameter. In the first file the function will write the parameter’s binary data and in the second file the function will write how many bytes it wrote to the first file. When this is done the function will call the renamed function. The second step is illustrated in the following pseudo code:

CODE

1 function __my_function(a,b):

2 . . .

3 . . .

(29)

5.3. Replay 19

4 end

5

6 function my_function(a,b):

7 write_to_file(a)

8 write_to_file(sizeof(a))

9 write_to_file(b)

10 write_to_file(sizeof(b))

11 return __my_function(a, b)

12 end

Every time the original function would have been called, the new function will be the called instead. For all the other functions it will be no difference since the new function calls and returns the same thing as the original function.

The last step in the recording module is to build and run the project with the modified code. The framework will build it with the command specified in the XML file. After a build has finished the framework will execute the binary and all the parameters will be recorded.

Before the module starts modifying the code it will first create a backup of all the files it will modify. When the framework has built the modified code it will restore the files.

5.3 Replay

The replay module can be used to call specific functions with prerecorded data as input to the functions. It is possible to run everything on the local computer but the local computer could also start the replay module on a remote computer. This gives the user of the framework the opportunity to test if a function is bit-exact between different platforms and architectures. An example

(30)

Figure 5.2: An example how the network of the computers participating in the test could look like.

When the replay module is started on the local computer it is first parsing the XML files. This will give the module a list of all the comparisons to do. Each comparison is done between two or more functions or a list of files containing prerecorded data. Since it possible to compare files containing prerecorded data it is possible to use reference output files.

The first step after the parsing is to send all the new files to all the remote computers specified in the XML files. The module will use the secure copy command (scp) to transfer the file to the remote computer.

After sending the remote computers have received the new files the module will start handle each separate comparison. These are the steps the module will do to generate output from each function in the comparison:

1. Send XML files to specified computer 2. Start a subprocess on the specified computer 3. Retrieve the files from the specified computer

Depending on whether or not a function is specified to be executed on the local or on a remote computer the module will handle this a little bit different. It will use secure copy to transfer the XML files between the local computer and

(31)

5.3. Replay 21

the remote computer. To start a subprocess on a remote computer the module will use a remote login program named ssh.

The subprocess that will be started will do the following:

1. Create the new main function 2. Build the new code

3. Run the new code

To avoid executing unnecessary code when generating output from a function the original main function will be replaced with a new main function. The new main function will first create a variable for each parameter and then load the prerecorded data into it. It will first read how many bytes to read from the file with the sizes and then read that amount of data from the file containing the actual binary data.

The next step is to call the function that is supposed to be tested with the parameters and then record the output from the function. The framework will write all the parameters and the return value to files. Since the parameters might be pointers and has been changed inside the function, it is important to also write the parameters to file, not only the return value. This is the pseudo code on how the new main function can look like:

CODE

1 function main():

2 a = read_from_file(nr_of_bytes(a))

3 b = read_from_file(nr_of_bytes(b))

4 ret_val = call my_function(a, b)

5 write_to_file(a)

6 write_to_file(b)

7 write_to_file(ret_val)

8 end

After the code has been replaced the subprocess will start to build and execute the new code. The result of the execution of the new main function will be one file for each parameter containing the binary data. These output files will be retrieved back to the local computer and when the output files from all the executions has been retrieved, the framework will test if the files are bit-exact.

This module is also making a backup of the file containing the original main function before it replaces the code. When all the comparisons are done the

(32)

(33)

Chapter 6

Conclusions

All software developing companies need to test their software to be able to guarantee a certain level of quality. Some software needs to be tested more thoroughly than others and thus the companies needs to find a balance of the cost of running a lot of tests and the cost of missing a bug. To automate the testing might be a good way of having good test coverage for a low cost but sometimes the return of automating the testing is not bigger than the investment.

The level of automation depends on the which programming language is used for the system under test. SILK is mainly implemented in C, which has caused some problems with the automation procedure. A pointer in C may refer to a single element or a list of elements and the number of elements it refers to is not connected to the pointer. In other programming language as e.g. Java the size of an array is always known. This would make it easier to implement a framework that required less information from the user of it and therefore easier to automate more steps.

The outcome of this thesis was a test framework, which fulfills the goals of the thesis by automating the testing procedure stated in section 2.1. The framework is currently being used by audio developers at Skype to further improve the quality of the source code and the final client.

Overall the thesis was successful and the time plan that was done in the project plan was possible to sustain. All the requirements from the company were possible to fulfill in the 20 weeks time frame of the thesis. The main reason why the time plan of the thesis was possible to sustain, is because when each phase was time estimated it took into account that problems will occur. Without this extra time for solving the problems properly, it was possible to avoid quick fixes that probably would cause bigger problems later on.

(34)

24 Chapter 6. Conclusions

6.1 Limitations

The framework has one major drawback. For each function the user of the framework wants to test, the user has to specify how many bytes should be recorded. Even though it is possible to use C-code, such as variables and the sizeof() function, to specify the number of bytes and the user only has to specify it once for each function this is still a drawback. If this was not required the entire test procedure would be possible to automate except from an initial set up.

6.2 Future work

Although the goals of the thesis were reached and fulfilled, the framework can be improved in several different ways.

Currently the framework needs to be tested more thoroughly. It has only been tested with a Windows 7 machine as the local computer but should also be tested on more platforms, such as Linux, Mac OS X and other versions of Windows. It is important that the framework is stable and robust. If the test framework is crashing all the time on a platform the tester would not trust the result from the framework. The question is how to test the test framework? By implementing another test framework or should this work be done manually?

It would also be great if it was possible to use the framework on other components and other software than SILK. This was kept in mind during the design and the implementation to make this possible in the future. The framework is currently only expecting the component or software under test to be implemented in C but with some minor changes it would be possible to enable to add support for more programming languages.

The framework could also be improved by extending it to be executed as soon as new source code has been committed. For each step that has been automated, less work will be required by the tester. The problem described in section 6.1 is another thing that should be automated in the future. It would make the framework more stand-alone.

(35)

Chapter 7

Acknowledgements

I would like to thank everybody in the Audio team at Skype and a special thanks to Jon Bergenheim and my external supervisor Yao Yi. It has been very inspiring and worthwhile to work with all of you. I would also like to say thanks to my internal supervisor at the university Mikael R¨annar and my family and friends for all the support.

(36)

26 Chapter 7. Acknowledgements

(37)

References

[1] H. Schaefer A. Spillner, T. Linz. Software Testing Foundations: A Study Guide for the Certified Tester Exam. Rocky Nook, 2011.

[2] R. Blacks. Pragmatic Software Testing: Becoming an Effective and Effi- cient Test Professional. John Wiley & Sons, 2007.

[3] J. Subramanyam C. Jones. The Economics of Software Quality. Addison- Wesley Professional, 2011.

[4] B. Pettichord C. Kaner, J. Bach. Lessons Learned in Software Testing: A Context-Driven Approach. John Wiley & Sons, 2001.

[5] B. Gauf E. Dustin, T. Garrett. Implementing Automated Software Testing:

How to Save Time and Lower Costs While Raising Quality. Addison- Wesley Professional, 2009.

[6] J. McKay G. Bath. The Software Test Engineer’s Handbook. Rocky Nook, 2008.

[7] P. Hamill. Unit test frameworks. O’Reilly Media Inc., 2004.

[8] C. Johansen. Test-Driven JavaScript Development. Addison-Wesley Pro- fessional, 2010.

[9] A. P. Mathur. Foundations of Software Testing: Fundamental Algorithms and Techniques. Pearson Education India, 2007.

[10] T. J. McCabe. A complexity measure. IEEE Transactions on software engineering, 2:308–320, 1976.

[11] G. Meszaros. xUnit Test Patterns: Refactoring Test Code. Addison-Wesley Professional, 2007.

[12] R. Patton. Software Testing, Second edition. Sams, 2005.

[13] W. E. Perry. Effective Methods for Software Testing. John Wiley & Sons, 2006.

(38)

28 REFERENCES

[14] G. Ramesh S. Desikan. Software Testing: Principles and Practices. Pear- son Education India, 2006.

[15] Skype. About skype. http://about.skype.com/ (visited 2011-06-20).

[16] Skype. Silk speech codec. http://developer.skype.com/resources/draft- vos-silk-01.txt (visited 2011-06-20).

[17] Skype. Silk: super wideband audio codec.

http://developer.skype.com/silk (visited 2011-06-20).

Automated software testing for cross-platform systems