Cross-Platform Post-Mortem Analysis in a Distributed Continuous Integration System

(1)

LIU-ITN-TEK-A--15/059--SE

Cross-Platform Post-Mortem

Analysis in a Distributed

Continuous Integration System

Karl Johan Krantz

(2)

LIU-ITN-TEK-A--15/059--SE

Cross-Platform Post-Mortem

Analysis in a Distributed

Continuous Integration System

Examensarbete utfört i Datateknik

vid Tekniska högskolan vid

Linköpings universitet

Karl Johan Krantz

Handledare Alexander Bock

Examinator Karljohan Lundin Palmerius

Norrköping 2015-09-22

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

This thesis aims to improve the cross-platform reliability of software components tested on distributed CI systems. More specifically, it is centered on extracting crash information from cross-platform crashes. Crash information was generated and parsed on Mac, Linux and Windows. The crash information was transformed into stack traces, and further work has been put into visualizing these using graph representation. The crash information proved to be valuable for developers in their day-to-day job, especially the raw crash information. However, the visualizations proved to be less than satisfactory for developers as they increased the mental load.

(5)

Preface

First off all, thanks to Skype for having me during the five months of the thesis, the amount of resources and raw skill at the Skype office in Stockholm is incredible and I am extremely thankful for the help. However, none of it would have been possible without the help of my dear friend and staffing colleague Ken Liu, who processed all the formalities regarding the thesis on Skype’s end.

A giant thanks to the CI-Frameworks team as well for having me be a part of their team during the thesis. Everyone on the team made me feel at home in the office and made sure I got all the help I needed. Thanks to Selcuk for being my manager and advisor during the thesis and helping me get in touch with stakehold-ers and help spear-head my future career inside of Skype. A special thanks to Johan Losvik, who helped a ton in my introduction to the team and my development setup. Thanks to Markus for being the other thesis student and a great friend to discuss things with and complain to when things did not go so well. Alexander, thanks for all the help with the thesis and this report, could not have done it without you!

(6)

1 Introduction 1 1.1 Purpose . . . 1 1.2 Research Questions . . . 2 1.3 Limitations . . . 2 2 Background 4 2.1 Testing . . . 4 2.2 Crashes . . . 4 2.3 Continuous Integration . . . 7 2.4 Transient Errors . . . 9 3 Theory 10 3.1 Post-Mortem Analysis . . . 10 3.2 Visualization . . . 11 4 Implementation 13 4.1 Post-Mortem Data Creation . . . 13

4.2 Data Parsing . . . 16 4.3 CI Integration . . . 19 4.4 Visualization . . . 22 5 Evaluation 25 5.1 Structure . . . 25 5.2 Interviews . . . 25

5.3 Team Feedback & Current Usage . . . 28

6 Result 29 6.1 Data Acquisition . . . 29

6.2 UI Results . . . 30

(7)

7 Discussion 33

7.1 Cross Platform Data . . . 33 7.2 Data Correlation . . . 33 7.3 Visualizations . . . 34

8 Future Work & Conclusions 35

8.1 Future Work . . . 35 8.2 Conclusions . . . 37 Appendices 41 A CDB Output 42 B GDB Output 43 C LLDB Python Output 45 D CrashReport Output 46 E JSON Structure 47

F Base Parser Code 48

(8)

1 |

Introduction

Today at Skype, reliability is a large key metric that developers need to track. It is one of the larger issues developers face in their day-to-day work as the product becomes more and more complex with each added feature.

This gets further problematic as the amount of devices per person increases each year [10]. This extends the need for more reliable software, since everything from TVs to fridges are getting internet connectivity, and there is a high likelihood that some of these devices will need to run Skype.

Expanding this point, in emerging markets such as India, low-range phones are vitally important to the development of the country. If the designed software is primarily tested against higher-end phones, it could have severe defects on lower-end phones. This could leave a significant part of the worlds population disconnected from the communication we take for granted each day.

1.1 Purpose

This brings on an important point, can developer use infrastructure and tools in order to help their day-to-day work in increasing the reliability of the product? A Continuous Integration (CI ) server provides automatic test executions on code check-in, this improves the reliability of the product by checking for regressions in functionality.

CI servers often relies on test executions being a black box without any out-put information on how tests actually ran, except the logical value of pass or fail for test cases. This causes issue in cases where the tests are crashing, as the root-cause of the software fault can be hard to identify without further information.

In this thesis, the role of the continuous integration server and the extra information it could provide to developers in order to increase the reliability of the product

(9)

is the primary focus. The thesis will include discussions and trials of methods to help developers with root cause fault detection in a distributed cross-platform CI system. This includes discussing and interviewing developers about non-functional requirements to improve the developer infrastructure in place at Skype.

1.2 Research Questions

We create two concrete research questions in RQ 1 and 2, that bring together the larger topic of the thesis.

RQ 1 Can analyzing the crash data from multiple platforms help in root cause fault detection?

RQ 2 Does visualizations of these crashes help developers in building a greater

understanding of the fault?

These two questions are then summarized into two larger areas of research and work,

Post-Mortem Analysis and Visualization. Post-Mortem Analysis will be presented

in sections 3.1, 4.1 and 4.2 where the crash cause of a process will be investigated.

Visualization will be presented in sections 3.2, 4.3 and 4.4, and will investigate

methods that can be used to visualize and utilize crash information.

1.3 Limitations

In order to develop a workable solution for Skype, there need to be certain limitations put on the thesis. Certain work and implementation details are simply out of scope for the thesis due to time constraints and to make a more interesting thesis.

Platform Limitations

In order to properly test the research questions, the supported platforms of this thesis does not need to be the same as all the ones Skype support. The goal is rather to show that the methods and theories work cross-platform. As such, this thesis focuses on the following platforms:

• Windows • Linux • Mac

(10)

Adding to this, there will be no work done in regards to the build processes. This is due to the wildly varying processes between teams and platforms and that the method and process described should be generic enough for any team to use. The work within the thesis also assumes native programming languages, as these languages are the target platforms for the CI system. The process involved would not be different for a managed language such as Java — excluding the engineering effort of extracting the information from the crashes.

(11)

2 |

Background

In this chapter, an outline of the background premises for the thesis will be presented. Abbreviations that are necessary for the rest of the thesis will be explained here as well.

2.1 Testing

Testing is the process of increasing the confidence in code working as intended. It is not the process of validating that a particular piece of code is running correctly. Olan [25] says the following:

“Validation is a process designed to increase confidence that a program functions as intended. Generally, validation involves testing. The desired outcome of testing is a guarantee that a program satisfies its specifications, but as noted by Edsger Dijkstra, testing can prove the presence of errors but not their absence. Nevertheless, careful testing greatly increases the confidence that a program works as expected.”

So while testing does not with total certainty decide that the code is running correctly, certain agile methodologies increase our confidence in the code.

2.2 Crashes

Crashes are often regarded as a critical failure of a system. A crash is unrecoverable and there is a large risk for data loss. Because of this, crashes are usually categorized as critical bugs and have a high priority to fix if they are reliably reproducible. By shipping a product with these reproducible critical bugs, the reliability of the product decreases. This is even more critical, as software faults can incur severe economic costs [8] and even cause physical harm to people depending on the area of development.

(12)

Crash Causes

Crashes are programming errors in nature, but they are also based on programmers being unable to foresee certain errors. This might be errors such as a pointer having a null value and then dereferencing it, causing a segfault which crashes the application.

In this case the crash is undesirable, but there are cases where a crash is desirable. Examples of such cases is invoking functions in the wrong order, or allocating an array that is too large to fit in memory. Such cases cause the corresponding ASSERT function to be called, raising an abort signal. This crashes the application, but the crash itself is desirable as it allows the developer to catch a fault that would otherwise cause harder to detect.

When an application crashes, it receives a signal from the operating system. The signal orders the application to immediately crash or handle the fault causing the signal. The type of signal [20, 27] depends on the type of fault detected, as the application may want to act upon the different signals in a number of ways.

Crash Information

In order to accurately find the reason why something has crashed, the developer needs information. There are systems in place on some platforms such as Mac OS X’s CrashReporter [2] or the Windows Error Reporting system [21]. These systems send the data off to a classification server [18], where the server picks crash buckets to place crashes into. Using these buckets, developers can identify the crash cause [11] by checking system states and hardware setups for the bugs.

If using these systems is not possible for some reason, there are native solutions that can fetch similar information manually from the operating system of choice. On POSIX compliant systems the shell is the tool responsible for alerting the kernel to generate a core dump for crashing processes. The data available for these core dumps if they are ELF (Executable and Linking Format) files, can be found in the Linux Programmer’s Manual [9].

The information contained within these reports, differs from system to system. Examples of this information could be:

• stack trace

(13)

• name of the application • stop reason

The one item that needs further explaining here, is what a stack trace is. A stack trace is a list of all the stack frames in the application runtime at a certain time of the applications lifetime. A stack frame contains the arguments that the function was called with and can also contain local variables.

def a ( value ) : b ( value *2) def b ( value ) : c ( value *5) def c ( value ) : error ( value ) a (10)

Listing 2.1: Python code that will cause a crash in the python interpreter. The code in listing 2.1 will cause a crash when it is run with the Python interpreter. This causes the Python runtime to print the stack trace at the point of the crash, this stack trace is shown in listing 2.2.

T ra ce b ac k ( most recent call last ) : File " crash . py " , line 9 , in < module >

a (10)

File " crash . py " , line 2 , in a b ( value *2)

File " crash . py " , line 5 , in b c ( value *5)

File " crash . py " , line 8 , in c error ( value )

N am eE r ro r : global name ’ error ’ is not defined

(14)

Root Cause Crash Detection

The center of research in this field has been in regards to discovering the root cause of crashes that occur. Wu et al. [30] describes an empirical approach to analyzing and deciphering crashes. By analyzing a large set of crashes, the method finds the most common crashing threads by ranking these against

• the proximity between the crash and the method in the stack trace • the time since modification of the method’s source code

• the length of the method’s source code

By correlating these different stack traces with a method called Crash Stack

Expansion, the method generates a static call graph. This call graph contains all

function calls possible in the application. By performing a control flow analysis, the method ensures that unreachable code is not included in the final call graph. Although the method is not strictly relevant to the thesis, it brings on an important point of a stack trace being one path through the call graph of an application.

2.3 Continuous Integration

Continuous Integration is a tool and an agile process [29], that mandates developers to continuously check code into the central repository. This improves the agile workflow by allowing developers to not have to worry about build systems or testing setups.

By integrating the integration workflow into the daily routine of developers, a noticeable productivity improvement can be reached. This productivity is both a perceived productivity increase as the team watches builds go through the integra-tion workflow [26] and a real one, as developers does not need to go through the integration workflow on the local development machines. This means that an agile team can push out releases faster, as any new code revision triggers the regression workflow necessary for deployment.

Creating supporting infrastructure for agile teams such as continuous integra-tion soluintegra-tions makes the day-to-day work of developers more effective. A CI system allows teams to have a staging environment for checked in code. The code must pass certain criteria such as static code analysis, code coverage and no failing tests. By checking in the code continuously during the day, a larger confidence exists that the next large check-in will not break the build. By conforming to this principle of

(15)

the code always being functional, running a production deployment is possible at any time1_.

Crashes under CI

CI Systems are often seen as black boxes as there is a certain input (some form of code change or build trigger), and there is an output (was the input accepted?). Just from looking into the output, it is difficult for developers to gain any form of insights why something went wrong, but there is usually some metadata to go along with the logical “pass” or “fail” values. The data usually involves what test cases failed and the expected value from the test case.

Crashes in a test case removes the possibility of extracting any metadata from the test case, as there is no way of getting any structured output since the executable crashed before writing any output. Certain testing frameworks [12] have solutions in place for trapping exceptions within the framework. By doing this, exceptions that would normally crash the execution will not interfere with the rest of the test cases. However, not all frameworks support this feature. This causes problems for developers, as there are cases when crashes cannot be reproduced locally.

CI at Skype

In Skype, the CI-FW team has built a solution that runs component level tests on multiple devices, platforms and OS versions. Triggering the tests is done either reactively, when a new revision of the code is available, or timed, so that they run during nights. These components are smaller libraries of the larger Skype product that can be device and platform agnostic. Verifying this agnosticism is important, which is where cross-platform integration testing becomes important.

In order for this testing framework to work over 10-15 teams, the system is designed to support multiple devices and platforms. By facilitating a M aster → Agent architecture, multiple tests for specific platforms or devices can be spawned by a build. These tests are then picked up by an agent that fits the requirements of the test. These requirements can be platform, operating system version, device or even that certain peripherals are connected. By running tests on multiple of these agents, edge case errors for certain platforms can be found and fixed before the code is distributed to customer devices.

The status from the test cases on the agents is then posted back to the database

(16)

which contains the data for the frontend. This data can then be viewed via a browser, through an interface built around the Flask Python framework. This frontend handles the dashboard that teams use in order to see the current build and the status of the respective tests. By building these quick-glance visualizations for developers, the team gains insight into the status of their integration pipeline and gives visibility into potential issues on certain platforms.

2.4 Transient Errors

As mentioned in Section 2.3, reproducing crashes from the CI environment on the local development machine can be difficult. This issue is further worsened by transient errors in tests2_{. These tests only fail in certain circumstances, such as}

resource contention, concurrency, networking, etc. As mentioned by Luo et al. [14], most ways to deal with a flaky test are severely unsatisfactory. By simply removing the flaky test, the error is at least not shown. This is however a poor solution as by removing tests the code coverage is reduced and a test is removed that is potentially revealing an error in the production code.

Even detecting these flaky tests are difficult, as an example Google runs any failing tests 10 extra times [14], and if the test passes any of those runs, the test is labeled as flaky. For certain causes of flakiness, such as threading race conditions, there are solutions for detecting the race condition [13]. These are not satisfactory for the general case, and as such it does not give any options for how the CI systems should detect and deal with these tests.

The Continuous Integration system is supposed to validate that the code is running correctly, if transient errors in tests exist, they dismantle the purpose of the system. By not knowing when the code is running correctly, developers cannot reliably identify if it is an issue with the testing code or the production code.

(17)

3 |

Theory

This chapter will describe the base theory of this thesis. The theory means the broader strokes of the thesis idea and describing in more detail the plan of the implementation.

3.1 Post-Mortem Analysis

Post-Mortem Analysis for this particular implementation is the process of gaining insights into why an application crashed. This insight could be everything from local memory summaries to stack traces.

By leveraging this kind of information over multiple crashes, correlations of what functions crashed could potentially be built in order to help developers increase the reliability of the software. Since Skype as a product is cross-platform the tests that are running at Skype should also be cross-platform. Correlating between these platforms could be important in order to localize the conditions of where the bugs occur. To further explain Post-Mortem analysis, certain topics needs to be discussed, namely Preparation and Data Acquisition

Preparation

Some preparations needs to be performed in order to enable data extraction from the crashing applications. Because of the way C++ code is built, there is only a limited set of build options that allow us to analyze crashes. The crash can be reliably analyzed, by not enabling any optimization settings and by providing debugging symbols.

The issue with optimizations is that the compiler can inline certain functions and alter the program flow. This makes it hard to deduce if the originally compiled function is the one we are crashing in. But even if these optimizations has been turned off, debugging symbols needs to be provided from the compiler in order to

(18)

give names to the functions that are contributing to the crash.

Data Extraction

In order to do any form of crash analysis, we need to extract data from the crash. There also may be need of some set up in order to generate the appropriate infor-mation from the crash. The bash Unix shell implements the ulimit [28] command which allows supported platforms to generate core dumps. This functionality exists for both Mac OS X and Ubuntu, which are the two POSIX-like systems running in the CI environment.

On Windows, there are multiple options to generate core dumps, such as us-ing a registry entry [16], or by runnus-ing the application with a debugger [17] to generate the core dump on crash. There is also options to generate the core dump through native C++ code [19].

Since a core dump is the state of the application when it crashed, we have access to local variables, member variables and function names from the crashing scope. This, in turn, means that we can generate a stack trace as clear text, which can be handed off to another parsing system that transforms the raw stack trace into some format that is usable by the frontend.

3.2 Visualization

Visualizations are what makes data usable, and in this case they could be vitally important in order for developers to gain insight about the crashes. They could in this case vary between being UI improvements in order to show the most important data for developers at any single point, but could also be graph based visualizations. More specifically, this approach would be about emphasizing that there are crashes for a team’s product. The utility of this would be if it was used in the different levels where a test exist, such as for a specific platform or build.

User Interface

As mentioned in section 2.3, the software will be used in a CI environment by multiple teams. Teams has monitors set up that continuously refreshes the latest build page in order to monitor test and build statuses over multiple platforms. As such, the UI needs to allow quick glance information in order to give information to developers that something has crashed.

(19)

In order for the UI to be usable by developers, the information needs to be concentrated in places where developers can find crashes. This information could be in the form of status indicators, color changes or tables describing the situation when a test-case crashed.

Graphs

There is also the possibility of using graph visualizations in order to further identify what conditions is causing the crash. By showing information about the stack traversal of crashing tests, insight could potentially be reached about what kind of error could be causing the crash.

Using graph visualizations, a way of facilitating information on why something has crashed could exist. Using the visual hints possible in graphs, some information that might otherwise be hard to understand can now be more easily understood. This insight may be related to that certain crashes occur only on a single platform, or that a crash on multiple platforms is related to one specific function.

(20)

4 |

Implementation

In this chapter, the main implementation of the thesis is presented. The implemen-tation is grouped into sections based on major areas of work. This includes topics that were implemented, but were removed due to defects or that it simply did not work as planned.

4.1 Post-Mortem Data Creation

In order to have any meaningful data for the frontend, we need to create data from the crashing application. This crash data need to be on the format that it can be parsed by the other parts of the application. As was mentioned in section 3.1, there are documented ways on all supported desktop platforms to generate core dumps. The chosen method to get a working solution, is by trying different implementations on each platform and evaluating them separately.

Windows

By looking at the options for Windows-based dump extractors, there are two ways to implement these: registry or debugger based methods. A debugger can catch the first-chance exceptions, which are the triggered as soon as an exception is raised. Even though the exception was caught within the application, the debugger can generate a core dump of the process. Comparatively, second-chance exceptions are triggered when the application itself does not handle the first-chance exception, and the debugger gets a second-chance to handle the exception.

Interfacing against ADPlus [17], a windows based application monitor, was done with the subprocess.check_output function in Python. This method allowed the dump creation to be enabled only for selected teams and test-cases. However, an issue surfaced with this method, as ADPlus spawns a separate process in a separate command prompt that it uses to track the application. Because of this, we are unable to track the output of the test executable.

(21)

Moving on to the registry based approach, we can add the registry entry Lo-calDumps [16]. This key enables the operating system to generate core dump files. There was however issues integrating this process with the other dump generator techniques already in place for specific teams. A hybrid approach was implemented instead, as the previous two methods failed.

This approach relies on the AEDebug [1] registry key to start the CDB debugger. The value of the nested DbgManagedDebugger key is set to the command shown in listing 4.1. This ensures that we can pass any command line settings to the debugger by changing the registry value.

cdb . exe - pv -p \% ld -c ". dump / u / ma c :\\ c r a s h _ d u m p s \\ crash . dmp ; . kill ; qd

Listing 4.1: Dump Generator Command. It attaches CDB to the process and generates a dump file in the crash_dumps directory.

In order to extract the necessary information out of the dump file, the debugger can extract the necessary information by attaching to the application. By adding commands to the -c parameter and ending with qd, CDB will not enter the interactive mode for the user.

cdb . exe -z < dump_file > -i < executable _folder > -y < ex ecutable_folde r > -c

". symfix ;. reload ;∼* kp ; qd "

Listing 4.2: Command to extract stack trace from dump file via CDB After listing 4.2 has been run, a stack trace that looks similar to the one shown in Appendix A is generated. The output can then be parsed in a later part of the process.

Linux

Linux distributions come with bash preinstalled, which supports the ulimit com-mand mentioned in section 3.1. The comcom-mand sets user level limits on the current shell, such as the size of core dumps. By setting this size to be as large as possible, the operating system will ensure that a core dump is created in the correct location.

(22)

ulimit -c un l im it e d

Listing 4.3: Sets the maximum size of core files to be as large as possible The command shown in listing 4.3 can be run through Python by using the resource package as can be seen in listing 4.4.

resource . s e tr li m it ( resource . RLIMIT_CORE , ( resource . RLIM_INFINITY ,

resource . R L I M _ I N F I N I T Y ) )

Listing 4.4: Triggers the same behaviour as listing 4.3 describes but for processes spawned from Python.

The location the dump file is created in is dependant on the settings in sysctl, the default in Ubuntu is to create a file named core in the current working directory. To extract the information from the core file, a similar approach to the one described in section 4.1 is implemented. By attaching GDB (The GNU Project debugger) to the executable with the core file, a stack trace for all threads can be extracted. In order to run these commands, the Python library pexpect was included. The library allows Python to interact with a spawned processes as a user would, this includes waiting for the correct prompt to appear before returning control back to Python. Using this library, a GDB interface was created which allows the debugger to run multiple commands against the debugger, appending the results together and returning them to the caller. The commands run in sequence are shown in listing 4.5.

p / r \" THREAD INFO \" info threads

p / r \" B AC K TR A CE \" thread apply all bt p / r \" INFO SOURCES \" info sources

Listing 4.5: The commands that are run in order to extract necessary data from linux core dumps.

(23)

OS X

As OS X is a mostly POSIX compliant system, it has a lot of similarities to Linux. By using the method described in section 4.1, large parts of the process could be re-used. The one difference would be to use LLDB instead of GDB to interact with the core dumps. This implementation proved to be unstable however, as LLDB outputs ANSI control sequences as part of its prompt. These ANSI characters makes it difficult to integrate LLDB into pexpect as the prompt is difficult to identify.

By instead turning onto Python to write a more stable interface, we can pro-grammatically interact directly with the LLDB interface [4]. As can be seen in Appendix C, thread #1 is selected, this does however not mean that the thread has crashed, as the LLDB Python interface automatically selects the first thread and cannot identify the crashing thread.

By instead activating Apple CrashReporter [2] and setting it to use Developer mode, crash logs will be generated for terminal applications as well as for GUI applications. These crash logs contain crash information such as which thread crashed, which is necessary to get better results in the later part of the thesis. This was activated by using the command in listing 4.6.

defaults write com . apple . C r a s h R e p o r t e r D i a l o g T y p e d ev el o pe r

Listing 4.6: Sets CrashReporter to developer mode.

The created CrashReport logs are stored in the∼/Library/Logs/DiagnosticReports/

folder in clear text. So there is no need to attach with any debuggers or other external tools. An example of a shortened version of one such file can be seen in Appendix D.

4.2 Data Parsing

By looking at the output of the previous steps, it is clear that some parsing will be needed to extract the data that is important for the later part of the thesis. The syntax of the stack traces are straightforwards as can be seen in Appendices A to D, by doing an overview of the structures, it was clear that building a general parser would be a good option.

(24)

Base Parser

By building a base parser, there is less code duplication between parsers, meaning that the code to write for each concrete parser decreases. A general solution can be designed by designing the parser to deal with the two primary entities existing in a stack trace; threads and stack frames.

The base parser starts off by splitting up the stack trace into a list of strings, this list can be easily iterated on, and indices are well defined when specifying ranges within the list that certain information exists within. It is also responsible to specify the data that is going to be exported from the stack trace. The data exported is the following:

• Thread

– Thread ID

– Did the thread crash? – Stack Frames

• Stack Frame

– Frame ID (Optional) – Function Name

– Binary Name (Optional) – Source Location (Optional)

Optional means that the concrete parser can choose to not export certain variables, this could be due to the debugger not supporting a certain feature or that the de-bugging symbols do not accurately give the source location. The exported data can then be exported to a JSON object, the structure of which is shown in Appendix E. Any concrete parser can be exported to this format.

In order to build a concrete parser, a class needs to be created that inherits from the base parser. The concrete parser needs to provide a get_backtrace(raw_data) function, and the following regexes:

• STACK_FRAME_REGEX • THREAD_REGEX

(25)

The get_backtrace function is specific for each concrete implementation as it returns a range of the larger list of strings that contains the entire stack trace. The code in Appendix F is an excerpt from the BaseParser class that shows the base algorithm that the concrete parsers plug themselves into.

CDB Parser

The CDB stack traces that are seen in Appendix A can be parsed by looking into the defining marks of the stack frames and threads. The threads can be identified by the thread ID and the “Suspend” string. However, there are no indicating fea-tures of the threads that suggest that any of the threads have crashed. Because of this, we leave the CRASHED_THREAD_REGEX empty, in order to alert the base parser that the crashing thread cannot be identified. This regex can be found in listing G.1. Stack frames are similarly identified, there are clear identifiers for the function address, the binary that the function comes from and the function name itself. In this particular case, there are no identifiers for the stack frame ID. But the base parser automatically generates the IDs if they are missing by keeping an incrementing counter in order to keep track how many stack frames has been detected. This regex can be seen in listing G.2

GDB Parser

Investigating the backtrace for GDB in Appendix B shows that the threads are primarly identified by the text "Thread" followed by a thread ID number, which can be used to build the THREAD_REGEX shown in listing G.3

By looking at the info threads section of the stack trace, the pre-selected thread is the crashed thread. A regex that returns the ID of the crashed thread is enough for the base parser to enable the crashed parameter on the thread object. This regex can be seen in listing G.4.

Further, we need to build the regex for each stack frame. By looking over the structure, the format looks as though there is a frame ID, followed by an hexadeci-mal address, the function name and then a dynamic library or source file that the function belonged to. The regex in listing G.5 makes no distinction between 32 or 64 bit operating systems, as the capture group is using the + operator that counts 1 or more hexadecimal characters.

(26)

LLDB Python Parser

The structure of the output from the LLDB Python interface in Appendix C has a structure that is controlled internally by the stack trace extraction script [4], as such the regexes can be less complex than the other parsers.

Each thread is primarily identified by the thread string, by finding this string, the threadID can be extracted. Each stack frame is identified by the frame string, which through the regex then allows the parser to extract the frameID, binary, functionName, file and line. These two regexes can be seen in listings G.6 and G.7.

It is worth to note here that this parser was not selected as the primary choice for OS X, as the method in section 4.2 allowed more features to be extracted from the crash.

CrashReporter Parser

A CrashReport is a raw text file, so the content is easy to inspect by looking into Appendix D. Identifying the threads is done using a similar regex to listing G.3. It can be seen in listing G.8.

This regex is further extended to identify which thread crashed. The string "Thread" followed by a number and then the string "crashed with" is a good match to identify the thread ID. This regex is shown in listing G.9.

A stack frame in CrashReporter is on the format of the frame ID followed by the binary name, the adress of the function and the function name. The optional parameters source file and line number are the last part of the a stack frame. This can be translated into the regex in listing G.10.

4.3 CI Integration

When the parsed data from each parser has reached the frontend, the frontend will need to show this data. A JSON-blob rendered without formatting is not helpful to the user in any way, so more meaningful representations needs to be created. By building overviews off this information in places that allows for quick-glance information, value can be brought to stakeholders of the developer teams.

(27)

for all platforms under a certain code revision. This is the common way to interact with the test cases. By going to a certain platforms particular build page, and then digging deeper into a certain test case we give an indication that a test-case has crashed by checking the corresponding stack trace column.

Figure 4.1: Excerpt of the testcase overview page. The columns are named the following, from left to right: Status, stack_trace, verifier, returncode

By clicking on the STACK TRACE link in fig. 4.1, the stack trace renderer is shown in a modal window. An example of this modal is shown in fig. 4.2.

(28)

Figure 4.2: The stack trace renderer modal that shows a stack trace in its entirety. The resource1 _{that allowed creation of the stack trace itself can be retrieved from}

the same view as fig. 4.1. This allows the developers to use these files in their own development environments to debug and reproduce the same conditions as when the crash happened.

A crash overview was also created on the build overview page. This view al-lows the user to click on links to open the stack trace renderer. An example of this is shown in fig. 4.3.

(29)

Figure 4.3: Overview of the crashes for a particular build.

4.4 Visualization

Visualizing the stack traces is important in this thesis, as the visual cues of other representation rather than UI changes is vital to the thesis. This means an entry point for the main visualization must be designed. This involves setting up options for the visualization to give the user further control of what is being visualized, these options are shown in fig. 4.4.

Figure 4.4: Control form for visualization overlay. It shows an example of the build overview page for a build where only “mac” and “linux” platforms has crashing tests.

Because of the need to design visualizations that developers can visually interpret, and with what was discussed in section 2.2 a graph visualization was decided to be the visualization of choice. The procedure starts by injecting metadata about the crash into the stack trace. The metadata is the name of the executable and the platform the test was run on. It then proceeds to clean all stack frames in the stack

(30)

trace in order to clean up differences in output from the debuggers. The procedures that are doing the cleanup are shown in listing 4.7.

s t a c k _ f r a m e = re . sub ( r " <.* > " , " <> " , s t a c k _ f r a m e )

s t a c k _ f r a m e = re . sub ( r " \ s ?\(.*\) " , " () " , s t a c k _ f r a m e ) Listing 4.7: Clean stack frames, to remove discrepancies between platforms. The first regex removes template parameters, the second one removes function parameters and any whitespace in front of the parenthesis.

In order to get the stack traces into a format that can be rendered using a graph, the procedure flattens the stack traces into their component threads. If the crash-ing thread can be identified, that is the only thread that gets returned from the stack trace. Otherwise, all threads are returned. This ensures that the processed information is the most valuable data to the visualization.

Since stack traces are reversed, i.e the last function the application entered before crashing is on the top, we reverse the stack trace so that the crash is the last element of the stack trace. This ensures that the graph will point in the correct direction, downwards being closer to the crash. The functions contained within this reversed stack trace is then used as the nodes for the graph. These are then connected together into edges, by using the zip function in Python, this is shown in listing 4.8.

edges = set ( zip ( stack_trace , s t a c k _ t r a c e [1:]) )

Listing 4.8: Python code to zip function names into edges, also ensures that identical edges will only be counted once.

The edges can then be added to the graph by iterating over the edge set and adding them one by one. The graph library of choice was the Python library NetworkX [24]. The library allows attributes to be stored with edges and nodes, which allows the procedure to store the number of times a certain path appeared in all the stack traces and in what platforms the path has been part of.

When the graph has been completed, some base colorization is done in order to help the identification of certain features in the graph. The entrypoint node of a stack trace is colored green, while the last node in a crashing stack trace is colored red.

(31)

The graph then enters the preparation step, where the graph edges get labeled, the size of the edges is based on the number of times the edge has contributed to a crash. This ensures a visual representation of how comparatively common a crash path is.

Since the graph is not interactive, a layout that brings a logical flow to the application is needed in order to visualize the data properly. NetworkX does not contain any internal algorithms that provides graph layouts. Instead, it leans on other libraries such as PyGraphviz which uses the Graphviz library in order to provide layouting algorithms. The layout chosen is the dot layout, which provides a hierarchical graph that has the correct downwards pull in order to visually represent crashes being further down. This is shown in fig. 4.5, where fig. 4.5a is using the “dot” layout.

(a) Graphviz “dot” layout.

(b) Graphviz “neato” algorithm.

(32)

5 |

Evaluation

This chapter will describe the evaluation of the thesis, primarily by informal interviews with developers that use the CI system and stakeholders of the CI-FX team.

5.1 Structure

The thesis has been primarily about bringing value to developers. As such, informal interviews were performed in order to bring developers into the process and evaluate the result of the thesis. The structure of the evaluation was to bring in stakeholders, developers and managers into informal interviews. The interviewees were selected by asking the stakeholders and through questions in the informal questions regarding if any other people would be interested in the thesis topic.

5.2 Interviews

Four interviews were done in order to evaluate the thesis. The structure of sec-tions 5.2 to 5.2 will be show descripsec-tions of the work the interviewee is involved in, what their opinions are and also general comments about future work.

Interviewee A

The first interviewee is a Principal Software Engineer and is an individual con-tributor to the media platforms in Skype. The interviewee is heavily involved in increasing the reliability in the test and build systems of Skype and is thus a primary candidate to interview regarding new systems of that type. The interviewee is also one of the people who were with the group which raised the primary issue that this thesis aims to solve, increasing the reliability of Skype’s internal components. Presenting the features of the work of the thesis was met with mixed responses from the interviewee. On the positive feedback, the stack trace modal view was

(33)

appreciated as it allows developers to get feedback on why something crashed. Something which was not possible or accessible before the thesis. The fact that core dumps are generated for all supported platforms, allowing the developers to download and attach the debugger to the core dump in order to further examine the state of the crash was also hailed as a useful feature.

However, the primary concern is that the graph visualization currently increases the mental load of developers in their day-to-day job. The tool currently only allows manual checks of certain builds and by using the tool, developers will need to check more things manually on code check-in. The fact that the visualization is not in-teractive, which makes it hard to work with was also brought up as a negative point. A solution to the issue regarding mental load was discussed during the inter-view. By integrating parts of the thesis into the existing project and issue tracking tool Jira [3], the data would be seen by a higher number of people. Since crashes are a lot more severe than normal test-failures, it would also be possible to open bug-reports for crashes in Jira. A link to the graph could also be embedded into the bug report, which can be used by the product manager of the team to gain insight into the crash and prioritize the bugs better for the developers.

Interviewee B

The second interviewee is the PM (Product Manager) for the Distributed Testing System (DTS ), a system for GUI level testing on multiple types of devices. The DTS system and the CI Framework overlaps in terms of what the respective systems are doing. By being the PM of a team with similar responsibilities, a lot of relevant thought and feedback was received.

As such, a lot of discussions with the interviewee focussed on the data presentation, as he is not working in a developer capacity. The impression of the graph repre-sentation was really good, as the thought of navigating multiple crashes in graph form rather than text form for the PM was a lot more valuable. The improvements discussed in section 5.2 was mentioned as well, and was met with a positive response as the transparency of crashes would increase, making them more actionable for the PM and developers.

A downside to the current approach was however raised, as it does not show any historical data on when something crashed for the first time, making developers work harder when a crash is to be dealt with as the amount of code revisions increases the longer the crash has not been taken care off. There would also be value in marking certain tests as consistently crashing if it is not a flaky tests, as

(34)

this would help prioritize the crashing tests in regards to how hard they are to reproduce.

Interviewee C

The third interviewee is a Senior Software Engineer in one of the Audio teams at Skype, during the interview a lot of interesting techniques were discussed, some are similar to the ones described in the thesis but bring a different point of view. As such, improvements and new techniques were the primary focus of the discussions. A case that was discussed is when normal debugging tools does not work. One such case is when the stack gets overwritten, this removes the ability of a debugger to print the stack trace normally. By instead using a similar method described by Wu et al. [30] a stack trace can be reconstructed. The start of the algorithm is similar, create a static call-graph of all possible routes in the application by parsing the assembly code. After this, the core dump can be used to extract the incomplete call-graph and intersect this against the static call-graph. The output is a normal stack trace, but one that needs to be symbolized manually.

The second discussion point that was brought up, was to identify other inter-esting features of the stack trace. In some cases it is one particular input that causes the application to crash, but in others it is simply the state the application is in currently that causes the crash. By creating a grouping system for the different crashes so that new paths are created only if they have different input arguments, further insights could be reached in regards to why certain paths causes crashes in different parts of the application. One example of values that would be of interest is null values that propagates throughout the stack trace, which could potentially cause crashes.

Interviewee D

The third interviewee is a Principal Software Enginer Manager for the real time media part of Skype, that includes audio and video. The interviewee is one of the people in the organization who is pushing heavily for more stability in tests over all teams in the organization. As such, the interview was based on how the thesis could be used to further that goal.

The interviewee was adamant that the graph renderer as it looks right now is not a useful solution. The main problem with the graph is that it is not interactive, and as such it was agreed that interactivity and better visualization would make the graph more useful. Some feedback was also brought forward regarding having a

(35)

crash history overview for each testcase. This is similar to what was discussed with Interviewee B, as such a chart would help in getting an overview of when the crash started to occur. Helping to navigate the crashes in other ways would help a lot as well, such as an overview of all crashes that has occurred the past weeks for a team. Some of the feedback in this interview led to real improvements on the work, before this interview the number of stack traces that had contributed to a certain edge in the graph was represented as color but afterwards it was decided to switch the representation to edge thickness.

5.3 Team Feedback & Current Usage

As the thesis progressed team members in the CI-FX team gave feedback regarding ideas to improve the thesis. This feedback has in some cases been included into the thesis, but is also mentioned here for future reference.

A member of the team noted that parts of the thesis can be extended to help debugging certain situation that occur within CI. As the CI system run tests, these tests can potentially time out, and even though the timeout is variable by looking at the previous test runs and increasing the limit. The tests sometime reach the timeout limit, for some cases this is a simple time limit that needs to be increased. However in other cases it is in fact a freezed process where developers want more information than a simple time-out message. By using the core dump generation and stack trace extraction processes within the thesis, information regarding these processes would become available to developers.

General feedback from users was not available through the thesis. As such, the only information available is developers facing issues with certain parts of the tools. One primary examples of this, is where certain teams noted that dumps were not being generated for their crashing tests. After some debugging it was discovered that their own dump generation method was interfering with the core dump generation that was implemented in the thesis. As a result, their tool was replaced with the core dump generator from the thesis.

(36)

6 |

Result

This chapter will present the results of the thesis in terms of tool capabilities and how the current UI looks.

6.1 Data Acquisition

In section 4.1, the resulting data was briefly stated. This data will be further interpreted in this section. The raw output data can be seen in Appendices A to D. These listings are used by the different parsers that was created in section 4.2 to translate the output data into a common data-structure. The result of this can be seen in tables 6.1 and 6.2. The table shows the different features that are exported by the parsers based on what data is available through the data generator.

Parser Thread ID Did Crash GDB Yes Yes

LLDB-RAW Yes No LLDB-Python Yes No CrashReporter Yes Yes CDB Yes No

Table 6.1: Supported features of thread parsers.

Parser Frame ID Function Name Binary Name Source Location GDB No Yes No Yes

LLDB-RAW Yes No Yes Yes LLDB-Python Yes No Yes Yes CrashReporter Yes Yes Yes Yes CDB No No Yes Yes Table 6.2: Supported features of stack frame parsers.

(37)

As shown in tables 6.1 and 6.2, the only parser that supports all features of the base-parser is CrashReporter, this does however come with the downside that developers cannot attach with a debugger to a core dump as these files are not generated. The other debuggers only partially support the features in base-debugger. The largest negative entry in the table is that CDB cannot identify which thread crashed. Analyzing the stack trace from CDB becomes a lot more difficult than the alternative debuggers as the analysis must be done on all threads.

6.2 UI Results

A large component if the thesis work was about showing information from the parsers. This was done partially through the web UI presented in section 4.3 but also through the graph visualizations presented in section 4.4.

The result of the UI work can be summarized in figs. 4.1 to 4.3. These figures show the result of the UI work that has been done.

Graph

A significant part of the work regarding the UI has been centered on visualizations of the crashes, specifically focused on graphs. These graphs starts with the user entering visualization options, as is shown in fig. 4.4. Graphs are then created by the procedure in section 4.4, and the results are shown in fig. 6.1

(38)

Figure 6.1: Example visualization of two stack traces. One from Linux and one from OS X. The text within the nodes are function names that has been obfuscated. The figure presented in fig. 6.1 show the major functionality of the graph visualizer. By starting at the top, we see that Mac has a single entry-point. All entry-points are marked with a green text, in order to distinguish the start of a stack trace from the rest. Two steps further down the graph, a loop is shown. These loops do not

(39)

count how many recursions happened in that particular loop.

As the graph continues downwards, a merge happens between the “MAC” and “LINUX” stack traces. This is shown by the platforms both being represented on the call edge. The merge also thickens the edge, this is interpolated based on how many of all stack traces contributed to this edge. In this particular case, all stack traces contributed to the edge, so the max thickness is set.

The edges afterwards are then split up again into platform specific crash locations. We can here see that the edge got thinner, as the number of contributing stack traces is now lower. The final nodes of the stack traces are marked as red, in order to distinguish the crashing function from the rest of the graph.

6.3 Evaluation

In chapter 5, the thesis was evaluated by performing informal interviews. These interviews can be summarized into points of feedback.

Stack Trace Modal The stack trace modal is appreciated as it gives developers

quick access to see the location of a crash on a certain platform. It does not give all the information a developer would expect from interactive debugging on their development machine but it suffices to get an indication off why a crash occurred.

Graph Visualization The graph visualizations as they are was not liked by

most interviewees. However, when looking at the reasons behind the dislike, it can be discussed that the ones liking the graph likes the concept behind it but not the current iteration. It could prove to be an effective way of visualizing the different stack traces, but as the users cannot interact with the graph in any way except panning in the modal it is not a valuable solution right now.

Mental Load The mental load was a negative point given by one interviewee.

Developers does not need more tools to check every day; they need fewer, and a tool that requires developers to continuously check a website is not a good tool. As a developer, the fire and forget mentality is a prospect to aim for, where developers do not need to worry about the build or tests until they fail rather than continuously monitor the test result.

(40)

7 |

Discussion

In this chapter, the results from chapters 5 and 6 will be discussed and summarized.

7.1 Cross Platform Data

During the thesis, a significant part of the work has been put into generating data from different platforms. These platforms have been Windows, OS X and Linux, and although the platforms are not all the ones that are supported by Skype, it is still a significant addition to the data that could previously be fetched from CI. The parsed data is in a human-readable JSON format which is available through the CI database. The JSON format is shown in Appendix E.

However, an issue with the current data storage solution is performance. The current system is as of today currently not working, as the query to fetch all the stack traces for certain builds took over 10 minutes to complete. As each test case can have an arbitrary number of properties, a search has to be performed over all properties for all test cases in order to find if the test case has a stack trace. This issue could potentially be solved by marking test cases as “crashed”, and finding the corresponding stack traces that way.

7.2 Data Correlation

During the evaluation, it was shown that the data is interesting, especially the fact that it is cross-platform. The interviewees also all agreed on that analyzing the data would give interesting feedback to the developers, but that the current the way that RQ 2 attempts to solve the is not the correct one currently.

An important point to make is that the stack trace modal was the primary posi-tive thing from all interviewees even though it does not show data from multiple platforms. A speculation behind this is that the stack trace modal is a simple tool to use, and is recognized by developers as a tool they already use. This is further

(41)

corroborated since the ability to get the core dumps from the testing system was also mentioned as a positive, similarly, it is a tool that developers already know how to use.

As such, the answer to RQ 1 is that the results in chapter 6 are inconclusive but promising. There was not enough work done on cross-platform correlations done in the thesis except for the work done for RQ 2.

7.3 Visualizations

For the target developers, the methods implemented did not bring on a greater understanding as demonstrated by the evaluation in chapter 5. However, at least one user who is not a developer did report a greater understanding of the crashes that occurred within a certain build. This gives an indication that there could be ways of redesigning the tool in such a way so that developers actually have a reason to use it. As was mentioned in section 7.2, the recognizability of a tool seems to be important to developers and this could be the cause why this tool is not useful right now — developers are simply not used to interacting with a stack trace as a graph.

However, the future usefulness of the tool was at least indicated by the feed-back for improvements from the evaluations. As such RQ 2 can be answered with a yes, with the point that higher interactivity is needed in order to make it something that developers will actually use.

(42)

8 |

Future Work & Conclusions

Although the thesis itself delivers on what the goal of the thesis was to investigate, there are points which could have been further developed. These are more platforms, Regex, debugger interface improvements and graph interactivity.

8.1 Future Work

Platforms

This thesis has been about generating cross-platform data. At the time of this writing, integration of the crashes on the three major desktop platforms has been completed. Although all supported Skype platforms was not the most important part of the thesis, more platforms helps since there is more data to work with. These platforms would be iOS [7], Android [15] and Windows Phone [23] de-vices. Since these devices are all connected devices with their own sandboxes that tests run in, the test runner needs to extract the crash information from the device using the native features of the device operating system.

Regex

Currently regex is used the language to communicate with the base-parser. Regex itself is fine for parsers, but as a domain specific language (DSL) it can be unreadable and lead to unmaintainable code. As such, an alternative would be to rewrite the base-parser to use a parsing library instead.

Debuggers

While the interface against the debuggers is currently stable and working, they do not share a common code base. A custom-made debugger interface would make further debugger integration easier, as the code would be specifically suited for the debuggers. There is also potential for further research into this area, like starting

(43)

a test with debugger attached and, upon a crash, send a message to a developer containing a link. This link could then be opened by the developer, an interactive prompt to the debugger is then shown through a web-browser that the developer can then interact with.

Graph Interactivity

Interacting with the graph representation was one of the largest points of feedback that was received in chapter 5. By replacing the current graph image generation system based on pygraphviz, with the D3.js [22] data visualization framework and Dagre [5, 6], a purely JavaScript based solution could be achieved. This would mean that all the interactions could take place on the client, including filtering of all different parameters, modifying the graph layout in real time and changing the visual appearance of the graph.

(44)

8.2 Conclusions

The results of this thesis shows how crash information can be extracted from a distributed continuous integration system. It also shows how this data can be parsed using minimal code integrations using a base-parser → concrete-parser architecture. The system is reliable enough to be run on more than 60 agents in a production system without any incident, and while the system does not support all platforms right now, it does show that it can in the future.

The visualizations described within the thesis were not as well received as initially thought, but demonstrates that there is room for further research and development in the following areas:

• Familiarity with visualizations • Interactivity of visualizations

• Integration within existing tools to lessen mental load

Finally, the value being brought to developer when doing the bare minimum of work for the thesis, extracting stack traces and generating core dumps has been shown to give immediate value. And although more work could be done in terms of generating visualizations to help facilitate root-cause detection, the familiarity of the current tools is helpful to the developers at Skype.

(45)

Bibliography

[1] Abhishek Mondal. Automatically Capturing a Dump When a Process Crashes. Accessed: 2015-05-26. url: http://blogs.msdn.com/b/dotnet/archive/ 2009 / 10 / 15 / automatically capturing a dump when a process -crashes.aspx.

[2] Apple Inc. Mac OS X Crash Reporter. Accessed: 2015-05-18. url: https: //developer.apple.com/library/mac/technotes/tn2004/tn2123.html. [3] Atlassian. Jira - Issue & Project Tracking Software. Accessed:2015-06-09.

url: https://www.atlassian.com/software/jira.

[4] Bwmat. Scripting LLDB to obtain a stack trace after a crash. Accessed: 2015-06-09. url: http : / / stackoverflow . com / questions / 26812047 / scripting-lldb-to-obtain-a-stack-trace-after-a-crash.

[5] Chris Pettitt. dagre - Graph layout for JavaScript. Accessed: 2015-06-25. url: https://github.com/cpettitt/dagre.

[6] Chris Pettitt. dagre-d3 - A D3-based renderer for dagre. Accessed: 2015-06-25. url: https://github.com/cpettitt/dagre-d3.

[7] Chromium Project. Retrieving Crash Reports on iOS. Accessed: 2015-06-12. url: https : / / www . chromium . org / developers / how tos / retrieving -crash-reports-on-ios.

[8] Sandeep Dalal and Rajender Singh Chhillar. “Empirical Study of Root Cause Analysis of Software Failure”. In: SIGSOFT Softw. Eng. Notes 38.4 (July 2013), pp. 1–7. issn: 0163-5948. doi: 10 . 1145 / 2492248 . 2492263. url: http://doi.acm.org/10.1145/2492248.2492263.

[9] elf - format of Executable and Linking Format (ELF) files. Accessed: 2015-05-18. url: http://man7.org/linux/man-pages/man5/elf.5.html. [10] Dave Evans. “The internet of things: How the next evolution of the internet

(46)

[11] Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt. “Debugging in the (very) large: ten years of implementation and experience”. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems

principles. ACM. 2009, pp. 103–116.

[12] Google. Disabling Catching Test-Thrown Exceptions. Accessed: 2015-06-15. url: https : / / code . google . com / p / googletest / wiki / V1 _ 6 _ AdvancedGuide#Disabling_Catching_Test-Thrown_Exceptions.

[13] Google. ThreadSanitizer, a data race detector for C/C++ and Go. Accessed: 2015-08-6. url: https://github.com/google/sanitizers.

[14] Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. “An Em-pirical Analysis of Flaky Tests”. In: Proceedings of the 22Nd ACM SIGSOFT

International Symposium on Foundations of Software Engineering. New York,

NY, USA: ACM, 2014, pp. 643–653. isbn: 978-1-4503-3056-5. doi: 10.1145/ 2635868.2635920. url: http://doi.acm.org/10.1145/2635868.2635920. [15] Manoel Ramon. Debugging tombstones with ndk-stack and addr2line. Accessed:

2015-06-12. url: http://bytesthink.com/blog/?p=133.

[16] Microsoft. Collecting User-Mode Dumps. Accessed: 2015-05-26. url: https: //msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v= vs.85).aspx.

[17] Microsoft. How to use ADPlus.vbs to troubleshoot "hangs" and "crashes". Accessed: 2015-05-26. url: https://support.microsoft.com/en-us/kb/ 286350/.

[18] Microsoft. How WER collects and classifies error reports. Accessed: 2015-05-18. url: https://msdn.microsoft.com/en- us/library/windows/ hardware/dn641147.aspx.

[19] Microsoft. MiniDumpWriteDump function. Accessed: 2015-05-26. url: https: //msdn.microsoft.com/en-us/library/windows/desktop/ms680360(v= vs.85).aspx.

[20] Microsoft. signal. Accessed: 2015-05-18. url: https://msdn.microsoft. com/en-us/library/xdkz3x12.aspx.

[21] Microsoft. Windows Error Reporting. Accessed: 2015-05-18. url: https : //technet.microsoft.com/en-us/library/cc754364.aspx.

[22] Mike Bostock. D3.js - Data-Driven Documents. Accessed: 2015-06-25. url: http://d3js.org.

(47)

[23] Mike Taulty. Windows/Phone 8.1 Debugging: Getting a Crash Dump File

From a Device. Accessed: 2015-06-12. url: http://mtaulty.com/CommunityServer/

blogs/mike_taultys_blog/archive/2015/02/19/windows-phone-8-1-debugging-getting-a-crash-dump-file-from-a-device.aspx.

[24] NetworkX developer team. NetworkX. Accessed: 2015-06-02. url: https: //networkx.github.io.

[25] Michael Olan. “Unit Testing: Test Early, Test Often”. In: Journal of

Com-puting Sciences in Colleges 19.2 (Dec. 2003), pp. 319–328. issn: 1937-4771.

url: http://dl.acm.org/citation.cfm?id=948785.948830.

[26] R Owen Rogers. “Scaling continuous integration”. In: Extreme Programming

and Agile Processes in Software Engineering. Springer, 2004, pp. 68–76.

[27] signal - overview of signals. Accessed: 2015-05-18. url: http://man7.org/ linux/man-pages/man7/signal.7.html.

[28] User limits - Limit the use of system-wide resources. Accessed: 2015-05-22. url: http://ss64.com/bash/ulimit.html.

[29] Dave West, Tom Grant, M Gerush, and D D’silva. “Agile development: Mainstream adoption has changed agility”. In: Forrester Research 2 (2010), p. 41.

[30] Rongxin Wu, Hongyu Zhang, Shing-Chi Cheung, and Sunghun Kim. “CrashLo-cator: Locating Crashing Faults Based on Crash Stacks”. In: Proceedings of

the 2014 International Symposium on Software Testing and Analysis. ISSTA

2014. New York, NY, USA: ACM, 2014, pp. 204–214. isbn: 978-1-4503-2645-2. doi: 10.1145/2610384.2610386. url: http://doi.acm.org/10.1145/ 2610384.2610386.

(48)

Cross-Platform Post-Mortem Analysis in a Distributed Continuous Integration System

LIU-ITN-TEK-A--15/059--SE

Cross-Platform Post-Mortem

Analysis in a Distributed

Continuous Integration System

Karl Johan Krantz

LIU-ITN-TEK-A--15/059--SE

Cross-Platform Post-Mortem

Analysis in a Distributed

Continuous Integration System

Examensarbete utfört i Datateknik

vid Tekniska högskolan vid

Linköpings universitet

Karl Johan Krantz

Handledare Alexander Bock

Examinator Karljohan Lundin Palmerius

Norrköping 2015-09-22

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

Preface

Contents

1

|

Introduction

1.1

Purpose

1.2

Research Questions

1.3

Limitations

Platform Limitations

2

|

Background

2.1

Testing

2.2

Crashes

Crash Causes

Crash Information

Root Cause Crash Detection

2.3

Continuous Integration

Crashes under CI

CI at Skype

2.4

Transient Errors