• No results found

Deployment failure analysis using machine learning

N/A
N/A
Protected

Academic year: 2021

Share "Deployment failure analysis using machine learning"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

IT 20 032

Examensarbete 30 hp

Juni 2020

Deployment failure analysis

using machine learning

Joosep Franz Moorits Alviste

Institutionen för informationsteknologi

(2)
(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Deployment failure analysis using machine learning

Joosep Franz Moorits Alviste

Manually diagnosing recurrent faults in software systems can be an inefficient use of time for engineers. Manual diagnosis of faults is commonly performed by inspecting system logs during the failure time. The DevOps engineers in Pipedrive, a SaaS business offering a sales CRM platform, have developed a simple

regular-expression-based service for automatically classifying failed deployments. However, such a solution is not scalable, and a more sophisticated solution is required.

In this thesis, log mining was used to automatically diagnose Pipedrive's failed

deployments based on the deployment logs. Multiple log parsing and machine learning algorithms were compared based on the resulting log mining pipeline's F1 score. A proof of concept log mining pipeline was created that consisted of log parsing with the Drain algorithm, transforming the log files into event count vectors and finally training a random forest machine learning model to classify the deployment logs. The pipeline gave an F1 score of 0.75 when classifying testing data and a lower score of 0.65 when classifying the evaluation dataset.

Tryckt av: Reprocentralen ITC IT 20 032

Examinator: Mats Daniels Ämnesgranskare: Justin Pearson Handledare: Jevgeni Demidov

(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(50)
(51)
(52)
(53)

References

Related documents

The used method for distance measurements in this project is based on the cross corre- lation function of the envelopes of received and reference signal.. The function output

In this thesis, two different unsupervised machine learning algorithms will be used for detecting anomalous sequences in Mobilaris log files, K­means and DBSCAN.. The reason

GP6: Process Mining should be a continuous process: The architecture with multiple web services and an integration service allows a continuous extraction of both event data

This thesis is submitted in partial fulfillment of the requirements for the Bachelor's degree in Computer Science.. All material in this thesis which is not my own work has

Therefore, the problems in work with pattern-based text search and analysis of large files, which are currently present in other tools, can be resolved by

Figure 3 illustrates the log recogni- tion rates per log diameter class and the two measurement scenarios when using the MultivarSearch engine. The PCA model from the

Vid en demonstration av Log Search kan anv¨ andaren skapa sig en uppfattning om hur systemet ¨ ar t¨ ankt att se ut, sam- tidigt som Vaadin ¨ ar l¨ att att ers¨ atta med Volvo IT:s

We can note that both the default and configured versions of our algorithm have a higher number of clusters for all tested log files when compared to the manual clustering