An Online Fault Detection Approach for Web Applications

(1)

An Online Fault Detection Approach

for Web Applications

Niklas Åholm

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

An Online Fault Detection Approach for Web

Applications

Niklas Åholm

When the user interface of a web application reacts sluggish, or simple tasks such as signing forms or saving data take unnecessary long time, the user experience is diminished. Hence, one needs to monitor the working of this application, and undertake suitable action if such behavior is detected. This report explores how a method of Online Fault Detection (FADO) can be used to monitor the condition of a web application. After reviewing the algorithm, results of the case study are presented.

The key insight is that the system is not so much interested in identifying individual request with an abnormally large response time, but in detecting situations where such long response times are consistently present. That is, individual requests are aggregated into blocks which are evaluated together. We explore how to do this, and how this scheme interacts with the FADO algorithm. The resulting system is compared to the existing system consisting of static thresholds, and it is indicated how the solution adapts to changing situations. Finally this report presents some of the challenges and the di erent choices one could make when deploying the FADO algorithm.

Tryckt av: Reprocentralen ITC UPTEC IT 17 017

Examinator: Lars-Åke Nordén Ämnesgranskare: Kristiaan Pelckmans Handledare: Jonathan Jonsson

(3)

nya funktioner levereras händer det att oväntade fel följer med, s˚a kallade bug-gar. Buggar kan vara mer eller mindre allvarliga och kan ibland vara sv˚ara att upptäcka. Buggar kan p˚averka användarna negativt p˚a olika sätt, till exempel kan systemets gränssnitt upplevas l˚angsammare än vanligt eller s˚a fungerar inte alla funktioner som de ska. Det är därför viktigt att kunna identifiera buggar s˚a tidigt som möjligt.

Tanken med den här rapporten är att beskriva samt utvärdera en metod som ska upptäcka när ett system inte fungerar som vanligt. Arbetet i den här rap-porten utg˚ar fr˚an ett särskilt systems användarloggar. Loggarna är uppbyggda av flera textmeddelande, alla med samma struktur, och inneh˚aller olika värden som beskriver hur systemet används och fungerar. Användarloggarna upp-dateras med nya textmeddelande allt eftersom användare använder systemet. Genom att övervaka dessa värden kan ansvariga f˚a en översikt över hur systemet fungerar. Tidigare har ansvariga för systemet använt sig av statiska gränsvärden och automatiserat s˚a att systemet ger dem en notis när dessa gränser överskrids. Den här rapporten beskriver en alternativ metod som g˚ar ut p˚a att man byg-ger upp en modell, som uppdateras kontinuerligt, över hur dessa meddelanden bör se ut och varnar ansvariga när nyinkomna meddelanden skiljer sig markant fr˚an modellen. Modellen i den här rapporten utg˚ar fr˚an systemets svarstider, tiden det tar för systemet att bearbeta olika uppgifter. Slutligen utvärderas och jämförs tv˚a metoder. En som använder en kontinuerlig modell och en annan

(4)

Acknowledgments

This thesis was supported by Decerno AB. I would like to thank everyone at Decerno for all the great help and support. Especially I would thank my super-visor Jonathan Jonsson giving me valuable feedback and helping me collecting data for my experiments.

From Uppsala University I would like to thank my reviewer Kristiaan Pelckmans who have providing me guidance and valuable feedback during the writing of this report.

(5)

1 Introduction 8

1.1 Project Goal . . . 9

1.2 Delimitations . . . 9

2 Related Work 10 2.1 Intrusion Detection . . . 10

2.1.1 Open Source IDSs . . . 10

2.2 Principal Component Analysis . . . 11

3 Theory 12 3.1 Unsupervised Learning . . . 12 3.1.1 Clustering . . . 12 3.1.2 Fault Detection . . . 12 4 Method 16 4.1 Requests . . . 16 4.2 Setup . . . 17 4.2.1 Features . . . 17 4.2.2 Datasets . . . 19 4.2.3 Epsilon . . . 21 4.3 Experiments . . . 23 5 Results 25 5.1 Experiment 1 . . . 25 5.2 Experiment 2 . . . 27 6 Discussion 30 7 Conclusion 32

(6)

List of Figures

4.1 HTTP Requests from 1 hour visualized as transactions with a 1 second interval. . . 18 4.2 HTTP Requests from 1 hour visualized as transactions with a 10

second interval. . . 19 4.3 Visualization of requests between 8am to 7pm for one day, the y

axis shows the average of get and post for every 10 seconds. . . . 20 4.4 Visualization of requests between 8am to 7pm for one day, the

y axis shows for each minute the average of get and post for the past 5 minutes. . . 21 4.5 All transactions from ds1, the blue dots represents the non faulty

and the red ones the faulty transactions when using a threshold of 150 for both yg and yp . . . 22

4.6 All transactions from ds1, the green circle represents the initial model w0 starting at origin and its radius epsilon. . . 22

5.1 stvisualized on a time line. . . 25

5.2 ftvisualized on a time line. . . 25

5.3 Transactions over time considered faulty by FADO but non faulty by the threshold method. . . 26 5.4 Transactions over time considered non faulty by FADO but faulty

by the threshold method . . . 26 5.5 A visualization over how the model evolved starting at w0to wt 27

5.6 s visualized on a time line. . . 27 5.7 f visualized on a time line. . . 28 5.8 Visualization of get and post at peak A, B and C . . . 28 5.9 Transactions over time considered non faulty by FADO but faulty

by the threshold method . . . 28 5.10 A visualization over how the model evolved starting at w0to wt 29

(7)

(8)

Abbreviations

HIDS Host-Based Intrusion Detection System. 7

IDS Intrusion Detection System. 7 IPS Intrusion Prevention System. 7

(9)

Introduction

Modern web applications are often complex and can be hard to monitor. This report explore how fault detection can be used to monitor such applications, us-ing log data from end users to find anomalies. Negative events causus-ing system downtimes and performance degradations a↵ect end users but also a company’s business continuity and are therefore important to find. System failures can de-pend on various reasons and can be hard to detect. Programming bugs, broken hardware and bad configurations are some examples that can cause failures.

Decerno AB is a consultant firm in Sweden specialized in developing enter-prise applications. Besides development they also maintain their clients’ ap-plications, this includes responsibilities such as deployment, managing servers, adding new features, fixing bugs and errors. FasIT is a system developed for Fastighetsbyr˚an1_{which is a web application part of a business solution used by}

real estate agents in Sweden. User requests made to FasIT’s server is stored in Elasticsearch2_{, a JSON-based search and analytics engine. Today Decerno uses}

a system that monitors a few key values from these requests that alarms when they exceed certain thresholds. The thresholds are static and based on the mean values for a time period. The problem with this solution is that it is sensitive to noise but also not all ways accurate when the number of user requests fluctuate. This leads to many false alarms that take unnecessary time from the operators. To avoid false alarms due to noise one can set higher thresholds, this can lead to undetected failures and is a trade o↵ the operators need to consider. Therefore this thesis presents an alternative method for fault detection in the domain of web applications, using online machine learning.

1_{Fastighetsbyr˚}_{an url : https://www.fastighetsbyran.se/} 2_{Elasticsearch url: https://www.elastic.co/elasticsearch}

(10)

1.1 Project Goal

The purpose of this project is to evaluate how online machine learning can be used to monitor a web application. To narrow down the scope one particular algorithm will be used called FADO (Online Fault Detection). The algorithm is invented by Kristian Pelckmans, at Uppsala University, and proven to come with worst time guarantees and is a purely deterministic way of detecting faults[7]. During this project a fault detection method using FADO will be developed with the purpose to alarm when the user requests deviate too much from the typical normal request. The online machine learning approach will be evaluated and compared to a method using static thresholds for detecting faults.

1.2 Delimitations

Today FasIT has up to 2000 simultaneously users and because each user request is stored the system can in short time generate a vast amount of data. Real-time analysis of high throughput data demand a lot of computational power. In this report real-time scenarios have been simulated by iterating over historical data in sequential order instead of streaming data in real-time. For evaluation FADO was evaluated and compared with the current method which is based on static thresholds. FasIT was chosen by Decerno because of the large amount of data available and because of the limitations of the already deployed fault detection system. No other data from any other applications were used.

(11)

Related Work

2.1 Intrusion Detection

Intrusion Detection Systems (IDSs) are used to identify malicious behavior and to strengthen a system’s security together with firewalls, anti-virus software and access control schemes. IDSs operate in real-time and when a suspicious activity is detected an alarm is raised, operators in charge can then take appropriate actions. There are two kinds of IDSs, Host-Based Intrusion Detection Systems (HIDSs) and Network Intrusion Detection Systems (NIDSs). The first works with data located at the host such as system calls and log files to prevent attacks. The latter instead monitors the whole network by analyzing the ongoing traffic. IDSs can also be categorized as based and anomaly-based. Were misuse-based are misuse-based on a set of rules written by domain experts. The benefits of misuse-based systems are that they provide a low false positives rate. In this context a false positive is a not malicious behavior being flagged as malicious. The false positive rate is the proportion of all alarms that was not caused by malicious behavior.

Anomaly-based IDSs instead try to di↵erentiate between what is normal and not normal behavior in a system and alarm when something severely di↵ers from the normal behavior. One problem with anomaly-based IDSs are the increase of false positives. For example, under heavy traffic load a 1 % false positive rate in a network would generate a huge number of false alarms. Intrusion Prevention Systems (IPSs) also tries to prevent attacks, for example by blocking suspicious connections, IDSs only alarms when suspected behavior is detected and give the control to operators to take appropriate actions. Because of the high false positive rate anomaly-based IPSs are not viable[2].

2.1.1 Open Source IDSs

Bro is an open source system for network analysis used primary for network intrusion detection but also supports pure traffic analysis[6]. Bro monitors a

(12)

network link passively and can detect intrusions in real-time. Bro uses a script-ing language, Bro language, to define security policies. This allows users to define their own policies. Bro is neither solely misuse or anomaly based but support both approaches. Bro is shipped with several default policy scripts, both misuse and anomaly based.

Snort is an open source misuse based intrusion detection system, using signa-ture matching to recognize byte patterns in packets to detect threats[8]. Snort logs incoming traffic and alerts system operators when there is a violation of the present rule set. Snort is shipped with a default set of rules, additionally users can also define their own rules. Snort uses Boyer-Moore string search algorithm to detect patterns. Boyer-Moore’s algorithm is especially efficient when the task is to repeatedly find a specific pattern in texts much longer than the pattern.

2.2 Principal Component Analysis

Principal component analysis (PCA) is a machine learning technique that trans-forms a set of values of possible correlated variables into a set of values of uncorrelated variables. PCA uses orthogonal transformation which makes the variables in the out coming set orthogonal to each other. Typically, one uses PCA to reduce the number of dimensions in a data set. The new variables cre-ated are called principal components. PCA can be used for anomaly detection and is well suited for problems when normal data is easily acquired but no labels are available.

In the paper Experience Mining Google’s Production Console Logs[9], Xu et al. describes how PCA can be used to detect anomalies in console logs from a large system at Google called GX. The logs used consisted of plain text files including a huge number of messages. They used two di↵erent features. One called the state ratio vector where the variables were the number of possible states, for example di↵erent HTTP status codes, and the values were the number of di↵erent states collected during a time period. The second one was called the message count vector where each variable corresponded to a di↵erent message type, similar to an execution trace with information of di↵erent events in a system. They then used PCA to discover the dominant pattern for each feature. By calculating the abnormal component of an incoming data vector using the dominant pattern from the training set, new messages could be categorised. If the abnormal component were large enough the message was flagged as an anomaly.

Because no true labels were available they could not evaluate their methods counting true/false positives. Instead they relied on reasoning and compared the result to other metrics such as latency from the same time period.

(13)

Theory

3.1 Unsupervised Learning

Unsupervised learning is the machine learning task to draw inferences from data were no labels or categories are available. Since there are no labels it is difficult to measure the accuracy of unsupervised methods. When evaluating methods of unsupervised learning one need to rely on heuristic arguments. For super-vised learning one can measure the accuracy with cross validation. The lack of possibilities to measure the accuracy is one way of distinguish unsupervised from supervised methods[4].

3.1.1 Clustering

Clustering is one of the most common type of unsupervised learning and is the task of finding hidden patterns or groupings in data[4]. The central goal of all clustering algorithms is to divide data into di↵erent subsets called clusters. Elements inside each cluster should be as similar to the other elements in the same cluster as possible. Common clustering algorithms are for example, k-Means, Gaussian mixture models and Hierarchical clustering. All clustering algorithms need a method to measure the similarity. Euclidean distance is a simple and common metric, used in k-means, that measure the distance between two points in the Euclidean space. Figure 3.1 is the formula for euclidean distance, u and v are vectors of n elements.

d(u, v) = v u u t n X i=1 (ui vi)2 (3.1)

3.1.2 Fault Detection

Fault detection is the task of monitor a system and alert when faults are de-tected. A fault can be defined as an abnormal behavior that can lead to failures.

(14)

Fault detection is closely related to anomaly detection and are evolved from re-search areas such as machine learning, statistical learning and data mining.

Limit Checking

One common fault detection method is to monitor and measure a variable y(t) and alarm when it exceeds a static threshold. This is referred to as limit checking[5]. Generally there is a lower limit ymin and a upper limit ymax.

Figure 3.2 shows the normal state of such system. When y(t) is less than ymin

or greater ymax the system should indicate that there is a fault somewhere in

the system.

ymin< y(t) < ymax (3.2)

Limit checking is widely used in process automation systems. Depending on context one can be interested in an upper limit, lower limit or both. For example, for vehicles a warning light lights up when the oil pressure drops too low and another warning light lights up indicating when the engine is overheated (upper limit). A disadvantage of fault detection systems solely based on thresholds is that they are very sensitive to noise. Another issue is that such systems only alarms when the thresholds are exceeded and faults already occurred. If one wants to detect faults before they occur, one can set lower thresholds but this will lead to more false alarms. Setting the thresholds to high will make them less sensitive to noise but instead make some failures go undetected, this trade-o↵ operators of such systems need to consider.

Trend Checking

Another method of fault detection is to measure a variable’s derivate y0_{(t) =}

dy(t)/dt. This is referred to as trend checking[5]. Similar to limit checking the inequality in figure 3.3 should be satisfied for a non-alarming state.

ymin0 < y0(t) < ymax0 (3.3)

Trend checking can detect faults earlier when the thresholds are small, one can also combine limit checking with trend checking for better results.

Online Fault Detection

Online learning includes methods for machine learning were data is incoming and processed in sequential order opposed to batch learning were data already are available. Online fault detection is a method of online learning were one monitors a stream of transactions and try to detect the faulty ones in real or near real-time. For these problems, there is two general approaches one based on supervised learning and another based on unsupervised learning. The supervised approach works by trying to decide if a transaction is faulty based on a predictor trained by a training set of previous labeled transactions. This report is about the unsupervised approach. Instead of working with labels a model is built over time trying to predict what the normal transaction looks like. For each iteration

(15)

a non faulty transaction and the model. For each non faulty transaction the inequality in 3.4 should hold else an alarmed is raised and the model wt2 Rn

is updated.

kyt wt 1k2< ✏ (3.4)

An alarm can be raised due to two reasons either the current transaction really is faulty or the model is inadequate, regardless the model is updated.

Algorithm 1 Online Fault Detection Initialize w0 2 Rn and ✏ 0.

for t = 1,2. . . do

(1) Receive transaction yt 2 Rn

(2) Decide if the transaction were faulty as kyt - wt 1k2  ✏

or not

(3) If so, raise an alarm. Update wt 1 ! wt

end for

FADO, see algorithm 2, is a purely deterministic online fault detection al-gorithm created and proved by Kristian Pelckmans to come with worst case guarantees[7]. Each transaction yt 2 Rn and model wt 2 Rn are vectors of n

dimensions. Epsilon ✏ _{2 R should be a non negative number so for each non} faulty transaction inequality 3.5 should hold.

kyt wt 1k2< ✏ (3.5)

And for each faulty transaction inequality 3.6 should hold.

(16)

Algorithm 2 FADO(✏) Initialize w0 = 0d for t = 1,2. . . do (1) Receive transaction yt2 Rn (2) Raise an alarm if kyt- wt 1k2 ✏ and set vt= yt wt 1 kyt wt 1k2 (3) If so, update wt = wt 1+ vt Otherwise set wt= wt 1 end for

It is important to choose an appropriate ✏ for best results. If ✏ is too small FADO will raise too many alarms. If ✏ instead is too large the detection capacity will be low. Equation 3.7 shows how the model is updated when an alarm is raised. Gamma, 2 R, in equation 3.7 decides the factor of how much vt should

influence the new model. This is basically a case of the exploration-exploitation tradeo↵, were one must choose between to exploit what is known or to explore what is unknown[1].

wt= wt 1+ vt (3.7)

(17)

Method

In this chapter the datasets used in this report are explained together with the settings and parameters chosen for the evaluation of FADO. The current approach Decerno uses for fault detection is also described.

4.1 Requests

HTTP is an application layer protocol that today is the de facto standard for data communication for web applications. HTTP is a request-response protocol used between clients and servers. When a client sends a request to a server the server should send back a response. All requests include a request line that has an URI and a method. If a client wants to send data to a server it can be enclosed in the request. RFC 2616 [3] defines several HTTP methods. Each method specifies a desired action. Decerno stores requests with methods of the following types:

- GET: request a resource to retrieve, for example a HTML file or a repre-sentation of an object in a database.

- POST: request that the server should accept the enclosed data for exam-ple from fields from a submitted web form.

- PUT: request that the enclosed data should be stored at the supplied URI. If a resource already is available under the URI it should be updated else a new resource should be created.

- DELETE: request that the resource available at the specified URI should be removed.

FasIT handles 3 million HTTP requests every day. 91.77% are GET requests, see table 4.1. Decerno does not di↵erentiate between POST, PUT and DELETE in their current fault detection solution therefore they will be referred to only as POST requests further in this report.

(18)

Table 4.1: HTTP requests distribution GET POST PUT DELETE 91.77% 4.96% 3.20% 0.07%

The requests are stored as records in ElasticSearch a JSON-based search engine, each record consists of several features. One key feature is the response time, the time it takes for the server to handle a request. The average response time for GET requests are approximately 64 ms and for the POST requests 672 ms. Deadlock is another feature stored in ElasticSearch, if true then the process serving the request is halted because it has entered a deadlock state else this feature is false. Deadlock can arise for example due to software bugs. All though important to find this rarely happens.

4.2 Setup

The purpose of this report was to evaluate if online learning could be used to monitor a web application. The idea was to build up a model of a normal HTTP request and raise alarms when incoming requests deviated too much from the model. Decerno already monitor requests using a threshold based approach with static thresholds and today alarms can be raised due to three reasons.

1. The average response time for GET requests exceeds 150 ms during 5 minutes.

2. The average response time for POST requests exceeds 1500 ms during 5 minutes.

3. More than 20 deadlocks during 5 minutes.

Each minute the application checks these thresholds and raises an alarm if one of them is exceeded.

4.2.1 Features

Further in this report the response time for GET requests is denoted get and the response time for POST requests is denoted post. get and post are measured in milliseconds (ms). Initially several features were considered to represent the typical request. But among all requests the only values that varied significantly were get and post. To model the typical request it was decided that a trans-action should include two features. The purpose of this report was not to find individual faulty request but instead monitor the overall state of the system. To reduce noise, individual very high get and post, each transaction is a repre-sentation of all requests during a time-based interval. In the following sections

(19)

and for m POST requests p is: p = ( ₁ 10m Pm i=1posti if m > 0 0 otherwise (4.2)

Because the mean of all post is approximately 10 times larger than the mean of all get each post was scaled down by a factor of 10. Figure 4.1 is a visualization of transactions received for one hour when using a 1 second interval. Figure 4.2 is a visualization of transactions from the same hour using a 10 second interval.

Figure 4.1: HTTP Requests from 1 hour visualized as transactions with a 1 second interval.

(20)

Figure 4.2: HTTP Requests from 1 hour visualized as transactions with a 10 second interval.

For each transaction FADO decides if an alarm should be raised. The larger interval the less each individual GET and POST request depend on the decision. A smaller interval results in more transaction overtime and a higher interval makes FADO less sensitive to noise.

4.2.2 Datasets

Figure 4.3 shows the average of get and post for every 10 seconds. The data are from a weekday from 8am to 7pm. Figure 4.4 shows for every minute the average of get and post for the past 5 minutes. Both data sets are from the same time. For the experiments two datasets of transactions were created, called ds1 and ds2. ds1 includes 3960 transactions composed of all pair of get and post from figure 4.3. And ds2 includes 660 transactions composed of all pair of get and post from figure 4.4. For both data sets post was scaled down a factor of ten, see equation 4.2.

(21)

(22)

Figure 4.4: Visualization of requests between 8am to 7pm for one day, the y axis shows for each minute the average of get and post for the past 5 minutes.

4.2.3 Epsilon

In a two-dimensional space euclidean distance is the straight line between two points. For each non faulty transaction ytin FADO inequality 4.3 should hold.

kyt wt 1k2< ✏ (4.3)

Epsilon (✏) is also a threshold, the maximum distance a non faulty transaction yt

can be from model wt 1. Similar to setting a static threshold based on domain

expertise and experiences one must choose a suitable epsilon. Figure 4.5 shows all transactions from ds1 except a few outliers. This is a visualization of a threshold based fault detection approach. The transactions inside the square are non faulty and the ones outside are considered faulty. The transactions outside the square raise alarms and the one inside the square does not. Figure 4.6 also shows the transactions from ds1. If the circle center at (0,0) represents a model w in FADO and the radius of the circle is epsilon. Then inside the quarter circle all transactions would be considered non faulty. When a faulty transaction arise the model gets updated and the circle centers moves towards the faulty transaction. In the experiments epsilon has been set to 100 to make sure both relative low but not too high values fit into the circle.

(23)

Figure 4.6: All transactions from ds1, the green circle represents the initial model w0 starting at origin and its radius epsilon.

(24)

4.3 Experiments

To evaluate FADO and compare it against using static thresholds a method was defined based on Decerno’s current fault detection solution. This method is called the threshold method further in this report. For each iteration t = 1,2,3,. . . FADO decides if an alarm should be raised or not. The decision made each iteration is denoted ft and is equal to 1 if an alarm is raised else 0. The

maximum distance between a model and a non faulty transaction, epsilon (✏), is set to 100. Gamma in FADO is set to 1.

ft=

(

1, ifkyt(gp) wt 1(gp)k2 100

0, otherwise (4.4)

The threshold method also makes a decision for each transaction denoted st:

st=

(

1, if yt,p 150 or yt,g 150

0, otherwise (4.5)

For the final evaluation two experiments were created, Experiment 1 using data set ds1 and Experiment 2 using ds2. For each experiment the initial model’s values was set to 0.

- Experiment 1: FADO and the threshold method will iterate over 3960 transactions, each transaction ygp is based on the mean of all get and a

factor of ten of the mean of all post from the past 10 seconds, see equations 4.2 and 4.1.

- Experiment 2: FADO and the threshold method will iterate over 660 transactions, 6 times less transactions compared to Experiment 1. Each transaction ygp is be based on the average of the past 5 minutes get and

post scaled down by a factor of 10.

(25)

mt=

0, otherwise (4.6)

A fault FADO detects but not the threshold method is denoted n:

nt=

(

1, if st= 0 and ft = 1

0, otherwise (4.7)

The number of m prior to time t is denoted qtand defined as:

qt= t

X

i=1

mt (4.8)

and the number of n prior to time t is denoted rt and defined as:

rt= t

X

i=1

(26)

Chapter 5

Results

This chapter presents the results of the two experiments. The outcome of the threshold method and FADO is visualized and explained. In addition to this a visualization and explanation how the model in each experiment evolved is given.

5.1 Experiment 1

Figure 5.1 and 5.2 show the decisions made over time by FADO and the thresh-old method. The latter raised 97 alarms and FADO raised 133.

Figure 5.1: st visualized on a time line.

Figure 5.2: ftvisualized on a time line.

Figure 5.3 is a visualization of the number of transactions FADO considers faulty that the threshold method considers non faulty over time. The number

(27)

Figure 5.3: Transactions over time considered faulty by FADO but non faulty by the threshold method.

Figure 5.4: Transactions over time considered non faulty by FADO but faulty by the threshold method

(28)

Figure 5.5: A visualization over how the model evolved starting at w0 to wt

.

Figure 5.5 shows how the model evolved from wgp= (0,0) to wgp= (80.9, 82.2).

Figure 5.4 shows that after 240 minutes (1140 transactions) FADO started to detect faulty transactions that were not detected by the threshold method. Be-cause there are no actual labels a faulty transaction is simple a transaction that triggers an alarm. In the beginning FADO detected several faulty transactions that were not considered faulty by the threshold method. This was a direct consequence that the initial model w0(gp) was set to (0,0). One could also base

the initial model on previous knowledge of the system and expertise to ensure a more stable model from the beginning.

5.2 Experiment 2

In both methods iterated over 660 transactions. Like Experiment 1 FADO raises more alarm than the threshold method in the beginning when the model is inadequate. Figure 5.7 shows three additional peaks (A,B,C) not present in figure 5.6. Peak A and B include two alarms and peak C includes 3 alarms.

Figure 5.6: s visualized on a time line.

(29)

Figure 5.8: Visualization of get and post at peak A, B and C

Figure 5.8 shows the average past 5 minutes for get and post 10 minutes before and 10 after peak A, B and C, with post scaled down by a factor of 10. The red line indicates the threshold at 150. For all three plots all values are below the thresholds (post is at 149.5 at C). Depending on the epsilon one chooses FADO can detect when the transactions deviate to a small degree from the model.

Figure 5.9: Transactions over time considered non faulty by FADO but faulty by the threshold method

Like Experiment 1 r increases less after the model has adjusted itself. In Ex-periment 2 all faulty transactions detected by the threshold method was also considered faulty by FADO see 5.1.

(30)

qt= 660

X

i=1

mi= 0 (5.1)

Figure 5.10: A visualization over how the model evolved starting at w0 to wt

.

Figure 5.10 shows how the model evolved after 48 alarms. ygand ypare

contin-uously growing for each alarm. This plot indicates that no alarms were raised in consequence to decreasing yg and yp values. The alarms raised were only

trigged by an increase in yg and yp that’s why the plot in figure 5.10 goes from

left bottom to top right.

(31)

Discussion

This report has explored how online learning can be used to monitor HTTP requests. The experiments showed that FADO, an online fault detection algo-rithm, can detect when response times for HTTP requests fluctuate. Only two features were used in the final experiments and the only feature transforma-tion applied was scaling. The two features g and p were continuously variables. When using features with a finite number of possible values either numerical or categorical one should consider how they should be interpreted by FADO. For example, categorical labels indicating unwanted events could be transformed to high values of real numbers. And labels indicating normal events could be transformed to low values of real numbers. One must consider the importance to detect certain events when choosing these values. For example, if a unwanted but not critical event is represented as a high value then a critical event should be represented as an even higher value.

For web applications it is crucial to keep the response times low. Too long response times a↵ect the quality of the service. Today Decerno uses two upper thresholds to monitor response times. As FADO detects when transactions deviate too much from the model it could also detect when the response times drop. This could be both beneficial and troublesome. If some part of the web application is not working as intended and the requests is not handled appropriate it could lead to response times near to zero. False alarms could also occur because the server load is less which leads to faster response times. But the alarms will end when FADO’s model has been updated. And when the response times rise again new alarms may be raised depending on the increase and the chosen epsilon. Therefore it is important to have an idea why alarms are raised.

In this report, a transaction is a block of requests from a time interval. Longer intervals make FADO’s decision less dependent on individual requests. In this thesis two di↵erent intervals were used. In the first experiment transactions are blocks of requests of ten seconds. In the second experiment a block is created every minute including all requests for the past five minutes. The latter is the interval Decerno uses today. The first method resulted in 6 times more

(32)

transactions and raised severely more alarms. In both experiments FADO raised more alarms especially in the beginning when the model was not adequate. In a real scenario, one probably would initiate the model based on experiences. Then less alarms would be raised in the beginning because an inadequate model.

Because FADO does not classify faults it is important for operators to in-vestigate the requests further when alarms are raised to draw inferences. It is also important to find a suitable epsilon and decide on a suitable interval as it highly a↵ect the detection rate and number of alarms raised.

(33)

Conclusion

This report has showed that online learning can be used to monitor a web application to detect faulty HTTP requests. In this report the goal has been to monitor the state of a web application therefore requests have been aggregated into blocks which have been evaluated together. Reducing the impact on each individual request.

FADO uses a decision rule based on a transaction’s distance relative to a model. In the experiments FADO has started with an uninitiated model were all features values were set to zero. This makes FADO raise additional alarms in the beginning. After the model has stabilized less alarms are raised because the model is inadequate. In the experiment a relative high epsilon, the maximum distance between a non faulty transaction and model, has been used. Using a lower epsilon would increase the detection capacity but also flag too many transactions as faulty.

In this report the transactions have been visualized in a two-dimensional space with yg on the horizontal axes and ypon the vertical axes. The decision

rule FADO uses can be visualized as a circle with center at point (wg, wp) with

radius epsilon. Transactions that fit inside the circle are reckoned as non faulty. When FADO detects a transaction that is outside the circle an alarm is raised and the model’s coordinates is moved toward the faulty transactions position in the two-dimensional space. The threshold method in this report can instead be visualized as a square. Transactions inside the square are considered non faulty and transactions outside as faulty.

The benefit of using FADO is that one can detect changes. The smaller epsilon the smaller changes in response times can be detected. Not only can one detect when the response times increase but also when they drop. Advantages of using an adaptive model as FADO is that problems that would go undetected when using static thresholds could be detected.

(34)

7.1 Further Work

In this report only two features have been used to compose a transaction. For further research it would be interesting to include more features. The parameter gamma ( ) in FADO was also fixed to 1 in this report. As gamma decides how much each faulty transaction should influence the new model it would be interesting to experiment with various values for gamma. It would also be interesting to experiment with di↵erent epsilons and see how small epsilon one could use without getting too many false alarms.

Because FADO is an online fault detection algorithm the next natural step for this project would be to implement a real-time solution. Depending on the traffic load one must make sure that the implementation can process all requests in real-time or near real-time. If not the result would be skewed.

(35)

[1] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

[2] J. J. Davis and A. J. Clark. Data preprocessing for anomaly based network intrusion detection: A review. Computers & Security, 30(6):353–375, 2011. [3] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and

T. Berners-Lee. Hypertext transfer protocol – http/1.1, 1999.

[4] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.

[5] R. Isermann. Fault-diagnosis systems: an introduction from fault detection to fault tolerance. Springer Science & Business Media, 2006.

[6] V. Paxson. Bro: a system for detecting network intruders in real-time. Computer networks, 31(23):2435–2463, 1999.

[7] K. Pelckmans. FADO: A Deterministic Detection/Learning Algorithm. Technical report, Department of Information Technology, Uppsala Univer-sity.

[8] M. Roesch et al. Snort: Lightweight intrusion detection for networks. In Lisa, volume 99, pages 229–238, 1999.

[9] W. Xu, L. Huang, and M. I. Jordan. Experience mining google’s production console logs. In SLAML, 2010.