Knowledge discovery and machine learning for capacity optimization of Automatic Milking Rotary System

(1)

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2016

Knowledge discovery and machine

learning for capacity optimization

of Automatic Milking Rotary

System

TIAN XIE

(2)

Knowledge discovery and machine

learning for capacity optimization of

Automatic Milking Rotary System

TIAN XIE

Master in System on Chip design Date: December 2016

Supervisor: Peter Mellgren (DeLaval),Saikat Chatterjee (KTH) Examiner: Saikat Chatterjee

(3)

i

Abstract

Dairy farming as one part of agriculture has thousands of year’s history. The increas-ing demands of dairy products and the rapid development of technology brincreas-ing dairy farming tremendous changes. Started by first hand milking, dairy farming goes through vacuum bucket milking, pipeline milking, and now parlors milking. The automatic and technical milking system provided farmer with high-efficiency milking, effective herd management and above all booming income.

DeLaval Automatic Milking Rotary (AMRTM) is the world’s leading automatic milk-ing rotary system. It presents an ultimate combination of technology and machinery which brings dairy farming with significant benefits. AMRTM technical milking capac-ity is 90 cows per hour. However, constrained by farm management, cow’s condition and system configuration, the actual capacity is lower than technical value. In this thesis, an optimization system is designed to analyze and improve AMRTM performance. The re-search is focusing on cow behavior and AMRTMrobot timeout. Through applying knowl-edge discover from database (KDD), building machine learning cow behavior prediction system and developing modeling methods for system simulation, the optimizing solu-tions are proposed and validated.

(4)

ii

Sammanfattning

Mjölkproduktion är en del av vårt jordbruks tusenåriga historia. Med ökande krav på mejeriprodukter tillsammans med den snabba utvecklingen utav tekniken för det enorma förändringar i mjölkproduktionen. Mjölkproduktion började inledningsvis med hand-mjölkning sedan har mjölkproduktionsmetoder utvecklats genom olika tekniker och gett oss t.ex. vakuum mjölkning, rörledning mjölkning, fram till dagens mjölkningskarusell. Nu har det automatiska och tekniska mjölkningssystem försedd bönder med högeffektiv mjölkning, effektiv djurhållningen och framför allt blomstrande inkomster.

DeLaval Automatic Milking Rotary (AMRTM) är världens ledande automatiska rote-rande mjölkningssystemet. Den presenterar en ultimat kombination av teknik och maski-ner som ger mjölkproduktionen betydande fördelar. DeLaval Automatic Milking Rotary tekniska mjölknings kapacitet är 90 kor per timme. Den begränsas utav jordbruksdrift, tillståndet hos kor och hantering av systemet. Det gör att den faktiska kapaciteten blir lägre än den tekniska. I denna avhandling undersöks hur ett optimeringssystem kan ana-lysera och förbättra DeLaval Automatic Milking Rotary prestanda genom fokusering på kors beteenden och robot timeout. Genom att tillämpa kunskap från databas (KDD), ska-pa maskininlärande system som förutsäger kors beteenden samt utveckla modellerings-metoder för systemsimulering, ges lösningsförslag av optimering samt validering.

(5)

iii

Acknowledgements

This thesis is worth the effort of eight months’ work. I would like to thank everyone who had supported and helped me along the way to get the master degree.

First and foremost, I am grateful for the guidance and support from my DeLaval su-pervisor Peter Mellgren. His visionary thought and patience lead me to the right direc-tion and provide me the opportunity to fulfill my idea. I would like to thank my KTH supervisor Saikat Chatterjee. He provides me with significant opinions and thoughtful advice.

DeLaval AMRTM department gives me a creative and positive working environment, I really love every day of it. Thank Thomas Olsson for providing me with comprehen-sive database; Thank Arto Rajala and Fredrik Kange for letting me join the badminton club; Thank all the colleagues’ help on improving my thesis work.

This two-year study in KTH teaches me a lot and gives me countless memories. Wish all my friends have a good future.

(6)

Introduction

1.1 Background

DeLaval is a world leader in the dairy farming industry, providing integrated milking so-lutions designed to improve dairy farmers’ production, animal welfare and overall qual-ity of life[1]. DeLaval provides the best solution for customers in more than 100 coun-tries, including milking systems, cooling and feeding systems, housing systems and farm management support systems. The company has been over a century of history since Gustaf de Laval founded in 1883. Nowadays, DeLaval is a company with 4600+ employ-ees and achieves 1 billion EUR net annual sales.

DeLaval AMRTM _{is the world first automatic milking rotary system. It is designed to}

accelerate customers’ transition from milking management to general farm management. AMRTM provides customers with more efficient labour usage, better cow health condi-tion and higher milk quality with less milk harvesting cost. The sophisticated system is intended for loose housing system or grassland farms with more than 300 lactating cows[2] in both voluntary and batch milking.

The AMRTM _{bidirectional herringbone rotary platform has 24 bails. With five}

hy-draulic functional robots (TPM1, TPM2, ACA1, ACA2, TSM will be introduced in chapter 2) equipped, automatic milking process can be handled simultaneously for the modern 24-hour operational daily farm. Every entrance cow has a unique electronic identification which contains information like teat positions for robot camera recognition. During milk-ing operation, Cow behavior, milk quality and platform performance are monitored and recorded into DelProTM _{database. Robot arms with specific effectors are responsible to}

clean teats, attach milking cups and protect teats against bacteria. After the attachment, milking process starts immediately. The qualified milk is stored in the milk tank through vacuum pumps. After a milked cow leaves the platform, milk cups and floor will be au-tomatically flushed and prepared for next cow.

1.2 AMR

TM

Capacity between ideal and reality

The capacity is an intuitive measurement for AMRTMperformance and a feature mostly concerned by customers. It is defined as the number of cows which have been milked in one hour (cows/hour) or the number of eligible rotations (the duration which at least one of the robots operated) in one hour (rotation/hour). The theoretical capacity 90 cows/hour

(10)

2 CHAPTER 1. INTRODUCTION

is calculated by assuming each rotation is finished in 40 seconds:

Capacity = 1Hour/Rotationduration = 3600sec/40sec = 90cows/hour (1.1)

The actual capacity is restricted by various factors. Previous work shows that robot success rate and cow traffic waiting times which were mentioned in AMR instruction book [2] have the high impact on capacity. Robot success rate (92-98%) is determined by equipment maintenance condition (especially the teat location camera system) and the unique cow features (the age, the udder shapes, the teat locations and its adaptability to AMR system). Traffic waiting occurs when cows stand in queue for entering or stray on the way to the platform (reasonable variation is 10-15% of the actual capacity). It could be reduced by the efficient farm traffic solutions and the learning ability of cows. An ex-ample of the actual capacity calculation is given:

Assuming 2.4 milkings/cow/24 hours, 10% traffic waiting time, and 95% robot suc-cess rate. The theoretical system capacity is 1620 milkings/18 hours (the maximum oper-ation time is 18 hours within 24 hours).

Robot success rate (95%):

1620 ∗ 0.95 = 1539 (1.2) Cow traffic waiting time (-10%):

1539 ∗ (100% − 10%) = 1385.1 (1.3) 2.4 milkings/cow/24 hours:

1385.1/2.4 = 577.125(577cows/24hours) (1.4) Capacity / hour:

1385.1/18 = 76.95(cows/hour) (1.5)

As the calculation shown, the actual capacity is reasonably lower than theoretical value and varied with specific farm conditions. Observation in real farm and data min-ing show that: if one robot is late, the entire system has to suspend until it finished; Milk cup attachment robot (ACA) has the right to extend the operation time in order to achieve a higher successful attachment rate; naughty cows which kick off the milk cups or milk slowly needed more time even second round to complete milking; moreover, human in-volved operation suspend also cost extra time. How to analyze and optimize the capacity of AMRTM based on current data is the primary challenge.

1.3 Purpose

(11)

CHAPTER 1. INTRODUCTION 3

on robot operation is another approach to optimize capacity. The AMRTM_system

simula-tion will be designed to examine proposed solusimula-tions by recreating the model of milking sequence and procedure. Meanwhile, it should maintain system operating features as actual as possible. Efficient theoretical analysis and simulation could verify the possible solutions and provide the best support for future farm testing.

1.4 Goals

The project has four phases: literature study; examining current data; creating algorithms and simulation for optimization and testing in actual farms. According to each phase, specific goals are listed:

1. Literature study:

• Understanding AMRTM _{system operation.}

• Researching supervised machine learning algorithm. 2. Examining current data:

• Analyzing cow behavior on AMRTM platform. • Defining bad cow.

• Analyzing robot timeout.

3. Creating algorithms and simulation for optimization:

• Build machine learning cow behavior prediction system.

• Designing AMRTM _{system simulation to examine optimizing solutions.}

• Comparing machine learning algorithms and discussing different optimizing solutions.

4. Testing in real farms. Limited by the time constraint, testing in actual farms is not able to research in this thesis project.

1.5 Methodology

This thesis follows quantitative research method. The phenomena, empirical hypothesis and positivism assumption are verified through statistical data analysis and mathemati-cal modeling. The measurement data is the most important part of quantitative research since it connects the empirical observation with experiment expression.

Statistics methods will be applied to analyze, interpret, organize and present data[3]. Through statistics results, we could find the internal relation between different variables and exclude interruptions for creating better sorting and optimizing algorithms.

(12)

4 CHAPTER 1. INTRODUCTION

System modeling and simulation, as an effective method for estimating the final prod-uct behavior without costly modification, are suitable for testing different optimizations, machine learning predicted results and robot timeouts. Modeling is the abstraction of ac-tual system procedure and constraints;simulation is the implementation of a designed model.

1.6 Outline

The remainder structure of this master thesis report consists:

Chapter 2 provides a comprehensive study on AMRTM _{system operation.}

Chapter 3 describes the data analysis, machine learning and simulation methods ap-plied in this project.

Chapter 4 is the detailed implementation of designing AMRTM _{optimization system.}

Chapter 5 presents the experiment results. Chapter 6 discusses the obtained results.

(13)

Chapter 2

Theoretic Background

2.1 AMR

TM

general description

A 24-bails AMRTM platform is shown in Figure 2.1. The internal rotary parlour has been divided into 2 areas by functionality. In area A, robot activities use 5 bails, and entrance/exit area occupy 4 bails. The rest of 15 bails in area B are available for milking and manual operation. For instance, if a cow kicks off the milk cup or it is set to manually milking, the milker could attach it by hand in area B.

Figure 2.1: AMRTMoverview

The ‘Adventure’ for a cow begin at lower-left corner. When it enters the platform, DeLaval DelRroTMFarm Manager starts to synchronize its information. Then it is moved to TPM1 for teats cleaning and milking preparation. During each rotation, five robots operate simultaneously. After the slowest robot finishes the task, the platform rotates one bail clockwise. Milking starts as soon as ACA2 robot attaches milk cups and stops when yield milking reaches the expected amount or time is out. The last step before

(14)

6 CHAPTER 2. THEORETIC BACKGROUND

iting platform at lower-right corner is teat spraying in order to inhibit bacteria. Cow’s behaviour, milk quality and time usage in each milking turn are saved in ‘CowMilking’ database.

The hydraulic functional robot is shown in Figure 2.2. Each robot consists of same robot arm structure (top graph in Figure 2.2) and different functional specific effectors(A in top graph in Figure 2.2).

The end effector of the teat preparation robot (TPM) has a 3D camera (A in bottom left graph in Figure 2.2) for finding the teats and a teat cleaning cup (B in bottom left graph in Figure 2.2) for cleaning one teat at a time. Two TPM robots are installed in a 24-bails parlour. Each cleans two parallel teats during one procedure.

Cup attachment robot (CAM or ACA) has a double magnet gripper (B in bottom middle graph in Figure 2.2) which is used to fetch two teat cups at a time from milk-ing point controller and attach them to teats. The teat cup is retracted individually when milking is finished or kick-off happens. Two ACA robots are installed in a 24-bails par-lour. Each attaches 2 parallel teats during one procedure. By configuration, ACA2 could extend operating time to attach ACA1’s unfinished teats.

Teat spray robot (TSM) is equipped with two spray nozzles inside the nose cone (B in bottom right graph in Figure 2.2). It is used to protect teats against bacteria until next milking.

Figure 2.2: AMRTMmilking robot

AMRTM _{operating process, named as Piece Of Cake (POC), contains platform rotation}

duration, robot functional duration, and control operation duration. As shown in Figure 2.3, a new POC starts before the platform rotated to the position. The time gap between POC start time and robot active time is ’RobotsNotReadyDuration’. During the end of robot operation, the platform is ready again for a new rotation. The last finished robot duration determines the robot functional duration, called ’SlowestRobotDuration’. The control signal consists of milking wait, unknown wait, and OC wait. By definition, one AMRTMoperating process (POC) equals to:

P OC = RobotsN otReadyDuration + SlowestRobotDuration + M ilkingwait+

(15)

re-CHAPTER 2. THEORETIC BACKGROUND 7

mains the same during simulation. ’SlowestRobotDuration’ as the main part of POC is influenced by milking sequence and robot performance. The optimization methods will apply to minimize ’SlowestRobotDuration’.

Figure 2.3: AMRTMprocedure

2.2 Database

AMRTM system saves all the crucial data generated during milking process and transmits them to cloud service. The database, which consists of system operating time stamps and cow’s milking information, had been converted to CowMilking and POC database.

In CowMilking database, each row includes all the information on one cow’s milking procedure at a time. Inside CowMilking database, it includes cow’s on-platform informa-tion, for instance, group, unique identify number, incomplete teats, kickoff teats, process Id, TPM Id & result & duration, ACA Id & result & duration, yield milk.

POC (Piece Of Cake) database arranged data based on all the information for one AMRTM _{platform rotation. Inside POC database, it contains rotation start time & end}

(16)

Chapter 3

Methods

3.1 Data analysis

The bad cow was defined as when its behaviour on AMRTM platform negatively influ-enced the system capacity. Knowledge Discovery in Databases (KDD) was adopted to find out the pattern of cow’s behaviour. The basic problem addressed by the KDD pro-cess is one of mapping low-level data (which are typically too voluminous to understand and digest easily) into other forms that might be more compact, abstract and useful[5]. The general KDD process contains data preparation, pattern searching, knowledge evalu-ation, and refinement. The knowledge obtained at the end of process fully depended on user purposes.

Figure 3.1: KDD process

According to the actual application, KDD process implementation was shown in Fig-ure 3.1. First was understanding the project scope and prior knowledge and selecting the target database for knowledge discovering. Second was pre-processing the selected database which included formatting data, cleaning unrelated information, and removing missing and incomplete data. Third was extracting project oriented features to reduce the complexity and distraction in the database. Fourth was using data mining methods to find patterns and rules based on the specific requirements. The most common data min-ing algorithms could be discussed in two sections: statistics, neighbourhoods and cluster-ing (classical techniques); trees, networks and rules (next generation techniques). Fifth

(17)

CHAPTER 3. METHODS 9

was evaluating the obtained patterns by returning to step one to four. This approach could be done multiple times to interpret the pattern and generate the knowledge on the database. Sixth was applying knowledge on other implementation and collating them for knowledge management.

3.2 Machine learning classification

3.2.1 Binary classification

Binary classification is a method for classifying target objects into two categories based on the classification rules. It has been widely use in machine learning problems, such as spam email detection (spam/ham)[6], disease diagnosis (benign/malignant)[7], gen-eral procedure control (pass/fail)[8]. The object database which contains multiple fea-tures had been binary labeled will be used to train a supervised machine learning model. Through feeding new data into well trained model, a predicted label on new object will be generated at the model output.

A well-constructed classification method can divide objects to their belonged class properly. However, miss-classification is inevitable and determines the overall perfor-mance. How to analyze the obtained result is an integrant part of classification. Con-fusion matrix is widely used to present the performance of a classification algorithm, specifically in machine learning (supervised) and statistical field. As shown in Table 3.1, a general confusion matrix consists of four type of instances, true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). Each row represents true condition and each column represents predicted condition. Correct predictions (TP, TN) are the values in the diagonal of the matrix and wrong predictions (FP, FN) are located outside the diagonal.

Table 3.1: Confusion matrix

General performance measurement equations which are derivate from confusion ma-trix are listed below:

True positive rate (TPR) or recall is true positives over predicted positives:

T P R = T P/P = T P/(T P + F N ) (3.1) True negative rate (TNR) or specificity (SPC) is true negatives over predicted nega-tives:

(18)

10 CHAPTER 3. METHODS

Positive predictive value (PPV) or precision is true positives over actual positives: P P V = T P/(T P + F P ) (3.3) Negative predictive value (NPV) is true negatives over actual negatives:

N P V = T N/(T N + F N ) (3.4) False positive rate (FPR) or fall-out is false positives over predicted positives (1 − SP C).

False discovery rate (FDR) is false positives over actual positives (1 − P P V ).

False negative rate (FNR) or miss rate is false negatives over predicted negatives (1 − T P R).

Accuracy (ACC) as an general performance estimation value is calculated by the cor-rect prediction over all instances:

ACC = (T P + T N )/(P + N ) (3.5) More terms like Null Error Rate (the correct rate if predicting the majority class only), Receiver Operating characteristic (visualize the performance by plotting true positive rate against the false positive rate) and F1 Score are adopted base on the application require-ments.

3.2.2 Decision tree

A decision tree is a classifier expressed as a recursive partition of the instance space [9]. As a real tree, the decision tree has one root (root node), trunks (internal nodes) and leaves (decision nodes). An example decision tree for mammal classification is shown in Figure 3.2. Body temperature is the root node which doesn’t have any incoming edges. It splits the tree to warm blooded and cold blooded vertebrates as child nodes. Since all the cold blooded vertebrates are non-mammals, the right child is defined as a leaf node which has no outgoing edges. Inside the warm blooded category, gives birth as an in-ternal node which has both incoming and outgoing edges, divides mammals from non-mammals. Both the child nodes of gives birth are leaf nodes.

Figure 3.2: A decision tree for the mammal classification problem[10]

(19)

Typical decision tree such as CART[11], ID3[12], C4.5[13] are using Hunt’s algorithm. Considering the feature expressions, the attributes for constructing a decision tree can be divided into four categories: binary, nominal, ordinal and continuous attributes. In this thesis application, continuous attributes which split the node by comparing the fea-ture X with the condition C(X > C or X < C) is applied. In order to obtain the best split, impurity measurement on the child nodes is carried out. At a node t, the impurity is defined as i(t) = F [P (1t), . . . , P (N |t)]. P (n|t) presents the proportion of instances in node t belonging to class n, n = 1, . . . , N . Commonly used impurity measures include:

Entropy(t) = − N −1) X n=0 P (n|t)log2P (n|t) (3.6) Gini(t) = 1 − N −1) X n=0 [P (n|t)]2 (3.7) In this thesis, CART for classification was implemented. The decrease in impurity for split t is define as:

4i(t) = i(t) − PLi(tL) − PRi(tR) (3.8)

Where PL and PRmeans the proportion of instance in t going to the left and right

children nodes of t. So the best split for node t is achieved by selecting the maximized 4i(t).

Two of the widely used optimizing methods for CART are a minimum number of points and cross validation. Fully grew decision tree might suffer from the overfitting problem. One solution is setting the minimum number of points in children nodes when splitting. In this case, the over-specific splits could be eliminated. Another approach is adopting cross validation on the train data to obtain the minimum misclassification error.

Comparing to other algorithms like CRUISE [14] and GUIDE [15], CART doesn’t need to decide which features to be selected in advance which means it will identify the most significant variables and eliminate non-significant ones[16][17].

3.2.3 Support vector machine

Support vector machine (SVM) as a supervised learning algorithm, used for regression and classification problems. Given training dataset, where each instance Xi has N

fea-tures with corresponding labels yi = (+1, −1)for binary classification. In SVM a

hy-perplane is used to separate instances to their belonging class. Support vectors were the closest instances to the hyperplane. The intention of SVM is trying to find a hyperplane which maximizes the margin between support vectors and the hyperplane.

The hyperplane can be described as:

w ∗ x − b = 0 (3.9) Where: w is normal to the hyperplane.

For a linearly separable binary classification problem, the training dataset can be for-mulated as:

w ∗ xi− b >= +1, yi = +1

w ∗ xi− b <= −1, yi = −1

combineto :yi(w ∗ xi− b) − 1 >= 0

(20)

Then the margin could be defined as the distance between two parallel plans: w ∗ xi − b = +1 and w ∗ xi − b = −1, which is 1/(||w||) by vector geometry. Maximizing

the margin equals to minimizing||w||. In another way, minimize 1/2||w||2 for Quadratic Programming (QP) optimization. The objective function could be formulated as:

min1 2 k w

2 _{k , s.t.y}

i(w ∗ xi− b) − 1 ≥ 0 (3.11)

For non-linearly separable classification, soft margin CPN )

1 i [18] is introduced into objective function 3.12: min1 2 k w k 2_+C N ) X 1 i , s.t.yi(w ∗ xi− b) − 1 ≥ 0 (3.12)

Kernel function [19] transforms features to a higher dimensional space which is more suitable for the non-linearly separable problem. By introducing the kernel function K(x), the classification function is formatted as:

yi(w ∗ K(xi) − b) − 1 ≥ 0 (3.13)

Commonly used kernel functions are: • linear: K(xi, xj) = xTi xj.

• polynomial: K(xi, xj) = (βxT_i xj+ r)d.

• radial basis function (RBF): K(xi, xj) = exp(−β k xTi xj k2).

• sigmoid: K(xi, xj) = tanh(βxTi xj). 3.2.4 Extreme learning machine (ELM)

The learning speed of feedforward neural networks was constrained by the slow gradient-based learning algorithms and the all the parameters needed to tuned iteratively. Huang and Babri [20] proved that a single-hidden layer feedforward neural network (SLFNs) can learn N distinct observations with at most N hidden neurons and almost any nonlin-ear activation function. In Huang, Chen and Siew paper [21], SLFNs with randomly cho-sen hidden nodes and intensively calculated output weights are universal approximators with various activation functions. Since the input weights and biases can be randomly assigned, the efficiency of SLFN improved compared to the researches [22][23][24] which need to tune the parameters of new hidden neurons.

(21)

The upper bound of the required number of hidden nodes is the number of distinct training samples, that is ( ˜N ) <= N. The output function for SLFNs is:

O(j) = ˜ N X i=1 βig(Wi• Xj+ bi), j = 1, ..., N (3.14)

Where Wi, bi are the weight and bias vector connecting input nodes i and hidden

node. Considering output node as linearly, βi is the weight connecting hidden node and

output nodes. g(x) is the active function. There exist βi, Wi, bi which makes: ˜

N

X

i=1

βig(Wi• Xj+ bi) = tj, j = 1, ..., N (3.15)

Where tj is the target output. Compactly, 3.15 could be written as:

Hβ = T (3.16) Where H =     g(W1• X1+ b1) · · · g(W_N˜ • X1+ b_{N )}˜ .. . · · · ... g(W1• XN + b1) · · · g(W_N˜ • XN + b_{N )}˜     (3.17) β =     β₁T .. . βT_˜ N     , T =     T₁T .. . TT_˜ N     (3.18)

In most cases, the amount of training data is larger than hidden nodes. Training a SLFN is equivalent to find a least-squares solution ˆβ of the linear system Hβ = T [25]. According to the theorem in [25], the smallest norm least squares solution is:

ˆ

β = H†T (3.19) Where H†is the Moore-Penrose generalized inverse of matrix H.

3.3 AMR

TM

system simulation

AMRTM system simulation builds up the model for examining possible optimizing solu-tions. The simulation follows real AMRTM system operation process and contained im-portant features including cow entrance sequence; robots parallel working; batch milking session and actual POC duration structure.

(22)

The theory of AMRTM _{system simulation is rearranging the cow entrance sequence.}

Classified bad cows are split from good ones and milked together at the end of each milking session. A simplified model is built to verify that rearranging cow entrance quence could save time. Firstly, Ovesholm (OHM) milking information in March is se-lected as simulation target. The bad cows are defined as those average ACA2 function durations larger than 40 seconds (the classification work will be presented in chapter 4). As shown in Table3.2, In order to simplify the demonstration, the real robot timeout for good and bad cow are replaced by average performance in good and bad group.

Table 3.2: Cow’s robot average performance

Secondly, assuming the entrance sequence follows the pattern that good cow and bad cow went into the system alternately (such as ‘GBGBGBGBGBG. . . ’, where ‘G’ delegates the good cow, ‘B’ delegates the bad cow). The milking process is recreated and shown in Table 3.3. At Poc 1, the first cow (G1) goes into the system and it is rotated to teat pre-pare model (TPM1) immediately by platform. Then a bad cow (B1) goes into system and platform rotates it to TPM1. Meanwhile, the first cow (G1) is rotated to TPM2. The last cow (G4) goes into the system at POC 7 and the sample milking process stops at POC 10 when the last cow (G4) finishes teat attachment at ACA2. All the bad cow positions and the slowest durations cause by bad cow were marked with red color. At the end of the table, total simulation time is calculated by summing up all the slowest durations.

(23)

Then the entrance sequence is rearranged as shown in Table 3.4. Bad cows (B1, B2 and B3) are milked after all the good cows went into the system. We could observe that the bad cow dominated slowest durations (red fonts) reduced from 7 POC to 6 POC and the simulation duration saved 21 seconds, comparing with Table 3.3.

Table 3.4: Rearranged entrance sequence AMRTMsystem simulation

(24)

Chapter 4

AMR

TM

optimization system design

The intention of the entire project is to increase the actual capacity of AMRTM _{as high}

successful milking rate as possible by investigating on cow milking sequence and robot timeout data. As shown in 4.2, the start of the project was to examine the current data and extract valuable information using KDD process. The basic understanding of the database and formatting work were done during data preparation. According to the cow’s milking behavior, bad cows were defined and verified through statistical meth-ods. Then machine learning prediction system was adapted to classify and predict cow behaviors in different farms based on single variable bad cow definition. Meanwhile, the analysis on robot timeout setting was carried out. All the possible solutions were tested through AMRTMsystem simulation. The output contains simulation time, capacity, bad cow report and robot timeout report. The detailed experimental procedure has been illus-trated in the sections given below.

Figure 4.1: System overview

4.1 KDD process

As illustrated in Figure 4.2, POC and CowMilking databases are handled separately to obtain refined database and knowledge for machine learning and system simulation.

After CowMilking KDD process, a refined CowMilking database was created in or-der to simplify database and increase processing speed; Single and multiple variable bad cow definition were created; ML database was generated for training and testing ma-chine learning algorithms with bad cow definition.

(25)

CHAPTER 4. AMRTM_{OPTIMIZATION SYSTEM DESIGN} ₁₇

After POC KDD process, refined POC database, robot timeout, and session informa-tion were carried out. The knowledge on robot timeout was applied for analyzing. Ses-sion information was used for creating the system simulation.

Figure 4.2: Data preparation

4.2 Bad cow definition

A bad cow was defined as its behavior on AMRTM had a negative impact on the ca-pacity. Through KDD process and expert consultation, the main effect features ordered from high to low impact were ‘ACA1&ACA2 durations’, ‘kickoff teats’, ‘incomplete milk-ing teats’, ‘TPM result teats’ and ‘ACA result teats’. As explained in the theoretic back-ground, ‘ACA1&ACA2 durations’ were decisive factors on ‘slowest robot duration’ which was the key component of POC duration. The rest of parameters also contributed to bad cow definition, ‘TPM result teats’ and ‘ACA result teats’ had more direct effects on robot successful rate; ‘kickoff teats’ and ‘incomplete milking teats’ reflected the cow’s on-platform behavior.

In order to understand the general cow behavior, the milking data was analyzed monthly. All the milking turns for each individual cow were summed up and calculated the aver-age numerical range for each feature. The distribution of cow’s robot durations reflected a more accurate performance in a certain period of time. The relationship among mean, median and skewness was introduced to refine bad cow definition.

Based on the distribution of average ACA2 durations and robot timeout initial set-ting, the single variable bad cow classification boundary was set at where the average ACA2 duration was 40 seconds.

For multi-variable bad cow definition, the classification boundary needed to fulfil all the conditions, which were:

(26)

18 CHAPTER 4. AMRTM_{OPTIMIZATION SYSTEM DESIGN}

• Kickoff teats possibility > 50%

• Incomplete milking teats possibility > 50% • TPM success possibility < 50%

• ACA success possibility < 50%

4.3 Robot timeout analysis

Initially, TPM robots were programmed to terminate operation at a certain amount of time even the task was not finished. However, ACA2 duration was assigned depending on ACA1 task completion status. Generally, TPM interrupt threshold was 30 seconds, ACA1 interrupt time was 45 seconds and ACA2 could increase to 90 seconds.

Two important robot timeout evaluation criteria were duration and successful rate. Theoretically, the higher successful rate was achieved by increasing robot operating dura-tion. By doing so, the rotation needed to wait a longer time until all the robot stopped.

Considering average ACA duration was longer than TPM’s, the investigation on ACA2 duration had two approaches. On the one hand, by increasing the ACA operating du-ration, the attachment successful rate could improve which meant Aca2 could save the chance of finishing ACA1 remaining task. This idea needed actual testing on the farm to obtain data which would not include in this thesis. On the other hand, limiting the maximum duration, especially on ACA2 also contributed to time saving and mechani-cal maintenance. Nevertheless, there was a trade off between shortening duration and maintaining successful rate. The detailed discussion was illustrated in result chapter and discussion chapter.

4.4 Machine learning classification

4.4.1 Creating database

The first step of machine learning classification was collecting data and selecting rela-tive features. The sufficient data set was subdivided into a training set, validation set, and test set. The training set was used for training the specific machine learning model. The validation set was used for evaluating the generalization error of the selected model. The test set was the target data which required to be classified by the well-trained model with optimal parameters. For an insufficient data set, cross-validation and bootstrap could be adopted to increase data diversity.

The training and validation data set were collected from three farms (OHM, FIN and LAP). It contained 781 bad cows and 6407 good cows. In order to build an unbiased dataset, the amount of good cows was cut down to match with good cows. The final data set contains 750 bad cows and 750 good cows (training set: 600 bad cows and 600 good cows; validation set: 150 bad cows and 150 good cows). OHM milking data in May was selected as the test set. It contained 30 bad cows and 524 good cows.

(27)

CHAPTER 4. AMRTM_{OPTIMIZATION SYSTEM DESIGN} ₁₉

maximum information gain from all possible binary splits at each node and repeating for the two child nodes. Overfitting is the general problem for a fully grew decision tree since some child nodes might contain few observations. In order to reduce generaliza-tion error, the constraint on minimum observageneraliza-tion in leaf node (’Minileafsize’) was set as an optimizing splitting criterion. Resubstitution error and confusion matrix were used to examine the performance of decision tree.

Figure 4.3: Decision tree implementation

As shown in Figure 4.3, ’DT Train’ contained training and validation process. the decision tree was trained by restricting the ’minleafsize’ and then validated. Consider-ing that the real farm normally had a biased herd which good cows dominated (>90% of herd scale) and the main purpose of prediction was on bad cows, I created the crite-rion for selecting the best model which had the well-trained parameters. In order to be the best model, the validation result must fulfil the conditions that the accuracy on both good cow prediction (PPV) and bad cow prediction (NPV) was over 80% and the cor-rect predicted bad cows were maximum.Then the actual farm data was fed into the best model to predict the cow behavior in next month.

(28)

Figure 4.4: SVM implementation

As shown in Figure 4.4, the support vector machine was trained by tuning polyno-mial degree and grid search for penalty parameter C and validated. The best model was obtained with the same criteria as in decision tree. Then the actual farm data was tested to get the cow behavior prediction in next month.

4.4.4 Extreme learning machine

The extreme learning machine implementation was based on the Matlab model provided by NanYang Technological University[29]. The input features were normalized into the range between -1 and 1. In this thesis, the effect of active function and number of hidden neurons on single hidden layer ELM cow behavior prediction were examined. Three ac-tive functions (tribas, radbas and sig) were chosen and compared with the same range of hidden neurons (1-800).

Figure 4.5: ELM implementation

As shown in Figure 4.5, the extreme learning machine was trained by tuning active function and hidden neurons and validated. The best model was obtained with the same criteria as in decision tree. Then the actual farm data was tested to get the cow behavior prediction in next month.

4.5 AMR

TM

system Simulation

(29)

CHAPTER 4. AMRTM_{OPTIMIZATION SYSTEM DESIGN} ₂₁

remained as actual as possible.

The simulation implementation was shown in Figure 4.6. The input databases were ’CowMilking data’, ’Session information’, and ’ Constant time’.

’CowMilking data’ included all the milking records in a specific month and arranged by the milking sequence. Each record had been labeled based on cow classification (good or bad). ’Session information’ which contained session’s last milking records was used for splitting milking records on session scale. Two milking sequences were adopted: orig-inal (follow the actual milking sequence in each session) and sorting (bad cows were sep-arated from good cows and rearranged to the end of each session).’Constant time’ was the waiting times which should keep constant since changing milking sequence didn’t influence them.

Followed by selected milking sequence, one cow went into the platform during each rotation. The new entered cow was assigned to TPM1 directly and moved to TPM2 at next rotation. When the last cow in one session went into the platform, the entrance cows were set to empty until the last cow left ACA2 bail. By doing so, the new session started without the influence from the last session.

The simulation generally simulated each POC procedure by calculating the slowest robot duration and combining it with constant time. After running the program, a report in ‘.txt’ format which contains farm information, simulation duration, bad cow frequency and yield loss in session scale was generated.

Figure 4.6: AMRTMsystem simulation

(30)

Figure 4.7: Optimization on sorting bad cows

The functionality of ACA2 robot caused an extendable duration. Theoretically, it sac-rificed time to achieve a higher attachment rate. An optimization on entire ACA2 du-ration aimed to simulate the trade off between successful attachment rate and dudu-ration. As shown in Figure 4.8, ACA2 robot terminated operation when duration reached the threshold time.

(31)

Chapter 5

Result

5.1 Bad cow definition

5.1.1 Single-variable definition

According to the single variable definition (average ACA2 duration larger than 40 sec-onds), the cow information in OHM, FIN, and LAP was presented in Table 5.1. The first column showed the source of milking data, named by ‘Farm name’ and ‘Month’. Then the number of milked cows, classified bad cows and the proportion of bad cows were demonstrated. The bad cow proportion in OHM and FIN were around 5% of the herd size. LAP had a larger amount of bad cows which hold around 15%.

Table 5.1: Single variable bad cow definition

5.1.2 Multi-variable definition

Multi-variable bad cow definition was a more comprehensive model. Since not only ACA2 duration, but also the kick off teats, TPM & ACA successful attached teats, and incompli-ant teats were applied to define a bad cow. The classification result was shown in Table 5.2, more bad cows were classified comparing to the single variable definition. Averag-ing three-month data, the bad cows were 12.5% for OHM farm, 25.5% for FIN farm and 23.92% for LAP farm.

(32)

24 CHAPTER 5. RESULT

Table 5.2: Multi-variable bad cow definition

5.2 Machine learning prediction

According to the decision tree implementation, the relation between Resubstitution er-ror and the minimum observation in leaf was presented in Figure 5.1. The erer-ror rate in-creases linearly below the 100 minimum observations in leaf node and becomes stable around 0.08. When minimum observation in leaf node exists 750, the error rate reached 0.45 instantaneously. It is evident that 750 observations in the leaf node is a threshold value for resubstitution error.

Figure 5.1: Resubstitution error with min-leaf size variation

(33)

increas-CHAPTER 5. RESULT 25

ing, PPV drop from 95% to 90%. But NPV had the opposite trend, it rising from 60% to 80% which meant the overfitting had been limited. Then both predictive values were sta-bled, 94% for PPV and 77% for NPV. When minimum observation in leaf size beyond 750, PPV reached 100% and NPV declined to 0% which meant the model was underfit-ting. It had explained the sharp increase on Resubstitution error in Figure 5.1.

Figure 5.2: DT classification accuracy with min-leaf size variation

In this thesis, support vector machine with polynomial kernel implementation was im-plemented. Firstly, the effect of polynomial kernel degree on the classification accuracy was examined. As shown in Table 5.3, the best prediction results in 1st, 2nd and 3rd de-gree of polynomial kernel with default parameters are presented. The 2nd dede-gree poly-nomial kernel had the outstanding performance over the other two degrees. On the con-trary, 3rd degree polynomial kernel seemed not able to learn the prediction rules by se-lected features.

(34)

Figure 5.3: Grid search for soft margin penalty parameter C.

5.2.3 Extreme learning machine

Three active functions (radbas, sig and tribas) were tested in the extreme learning ma-chine implementation. PPV and NPV in three active functions had the same trend on increasing the number of hidden neurons. Considering ELM algorithm randomly gener-ated input bias and weights, 20 trials for each hidden neuron were implemented in train-ing phase. As shown in Figure 5.4, average and best predicted records were presented for both PPV and NPV. By observing, PPV generally had better prediction accuracy than NPV. Both PPV and NPV decreased when the number of hidden neurons beyond 300. The downward trend of NPV was noticeable around 20% than 5% in PPV.

Figure 5.4: ELM

5.3 Robot Timeout

(35)

CHAPTER 5. RESULT 27

Table 5.4: Aca2 duration analysis

5.4 AMR

TM

system simulation

The simulation results were presented in two aspects, system operating time and capac-ity. The simulation database contained three months’ (March, April, and May) milking records and system operating time stamps in four farms (OHM, OTT, FIN and LAP). Based on the optimizing implementation, enable TPM or disable TPM during milking bad cows divided the simulation into two branches. Within each branch, the optimiza-tion level included: bad cow sorting; bad cow manually milking in 20 seconds; bad cow manually milking in 10 seconds; bad cow manually milking in 6 seconds.

(36)

Figure 5.5: Actual AMRTMsimulation time

The actual simulation capacity for four farms in different optimizing levels (original, sorting, ACA 20, ACA 10, ACA 6) with TPM enable/disable in three continue months are presented in Figure 5.6. The capacity was calculated by dividing milking records by simulation time. Since the milking records were constant, the decreasing of simulation time increased the capacity.

(37)

Chapter 6

Discussion

6.1 Bad cow selection

In order to sort cows and optimize the system performance, we need to select bad cows properly. In the discussion, we look at the different variants of the bad cows and how the composition of bad cows varies over time. Two selection methods (single variable & multi-variable) were compared.

6.1.1 Bad cow classification with single variable

The bad cow defined by single variable (average Aca2 duration > 40 sec) was applied to the AMRTM system simulation. The amount of bad cow was in the reasonable range (5%-10% of the herd) in OHM, OTT and FIN farm. However, the proportion of bad cow in LAP reached around 15% of the herd (114 bad cows over 816 cows in March). The bad cow proportion influence on the optimization methods will be explained in 6.1.3.

The variations of the bad cow in four farms were shown in Figure 6.1 to illustrate the bad cow relation in two continue months. March is the first sampling month, so there is no bad cow composition. The following months contain three classes of the bad cow, which are new bad cows appeared in this month (Blue bar); same bad cows appeared both this and last month (red bar) and the bad cows which performance good in last month (green bar).

From the statistical analysis, half of the bad cows still performed badly in next month. Around 20% of bad cows were newly introduced cow in each month. The prediction on last month’s good cow turned to bad cow in next month had a significant role in sort-ing cows and optimizsort-ing system capacity, since those cows hold 30% of bad cows. OTT as a test farm contained less cow and implemented more system test and update which influenced the bad cow variation and system performance.

(38)

30 CHAPTER 6. DISCUSSION

Figure 6.1: Bad cow variation in four farms

6.1.2 Bad cow classification with multi-variable

Multi-variable defined bad cow provided us a comprehensive understanding of bad per-formance. Instead of only focusing on cow’s ACA2 duration, kick-off teats probability, incomplete teats probability, and robot result teats were taken into consideration. How-ever, multi-variable defined bad cows occupied 15% of the farm herd and even 20% in LAP farm. Meanwhile, the bad behaviors except average ACA2 duration didn’t have the direct influence on system operating time which meant the defined bad behavior didn’t influence the robot performance. In this case, milking multi-variable defined bad cows together was difficult to implement (large amount) and might reduce the optimized ca-pacity than single variable definition.

6.2 Machine learning classification

The analysis on machine learning classification was focused on the prediction accuracy. Three machine learning algorithms (Decision Tree, Support Vector Machine and Extreme learning machine) were trained with same training data, then test data (OHM cows’ be-havior in May) was fed to the well trained models and outputted the best predicted re-sults of cows’ behavior in next month. According to the machine learning implementa-tion methods in Chapter 4, the best classificaimplementa-tion results were shown in Table 6.1. Be-cause of the herd scale varied in each month, the prediction was only based on the pre-sented cows in the test month which meant the newly entered cows in next month were beyond consideration. The behavior in last month was used as a reference prediction to compare with machine learning predicted results.

(39)

CHAPTER 6. DISCUSSION 31

Table 6.1: The best machine learning predictions and behaviour in last month

However, miss classification of actual good cows limited the performance of machine learning classification on the test data (OHM milking records in May).

Table 6.2: The best machine learning model and last month model performance results Considering the split method of decision tree is similar to the criteria of bad cow def-inition, it was selected to predict cow behavior. However, by optimizing the minimum observation in the leaf node, the general predicted result only used average ACA2 dura-tion as split criteria. The predicdura-tion effect of other features was discarded. The best deci-sion tree (CART) predicted result used average ACA2 duration and average ACA2 result teats.

In order to examine all the selected features and get the optimal prediction, support vector machine was adopted. With 2nd degree polynomial kernel and optimized C pa-rameter, a better predicted result was generated. However, the execution time of SVM was longer than decision tree. Especially when implementing grid search method.

The wish of combining speed and accuracy propelled me to find a more suitable al-gorithm for predicting bad cows. Extreme learning machine alal-gorithm believes that the parameters (Input bias and weights) for a neuron network could be randomly assigned and not need to be well tuned. This impressive feature attributed to extremely fast learn-ing speed and the predicted result is compatible with DT and SVM.

6.3 Comparison on different optimizing levels

(40)

con-32 CHAPTER 6. DISCUSSION

tained four optimizing levels: sorting bad cows and milking at the end of each session; manually milking sorted bad cows in 20 seconds; manually milking sorted bad cows in 10 seconds and manually milking sorted bad cows in 6 seconds. Capacity and saved time as the measurement of different optimizing levels were analyzed in this section.

The capacities in different farms were shown in Table 6.3. The original and sorting didn’t change robot durations, so the increased capacity came from rearranging the milk-ing sequence. In the manually bad cow milkmilk-ing levels, enable TPM prevented the ca-pacity increasing since TPM operating duration was longer than 20 seconds. However, disable TPM could keep the rising trend, which means the milker needed to finish both cleaning and attaching. At the sorting level, the capacity increased only around 1 ∼ 3 cows/hour. Considering the amount of bad cows hold 5% of the herd (10 ∼ 30 cows), manually milk was proposed. Based on the milker’s proficiency, manually milk time was set at 20, 10 and 6 seconds. When enable TPM, capacity increased 3 ∼ 6 cows/hour. We increased 4 ∼ 11 cows/hour with disabled TPM. It is important to notice that the num-ber of bad cows reached 15% of the LAP herd which caused that capacity increased 8 cows/hour with enabled TPM and 14 cows/hour with disabled TPM.

(41)

CHAPTER 6. DISCUSSION 33

The optimized saving time provided how much time could be saved by implement-ing the optimizimplement-ing methods. In Table 6.4, enable TPM and disable TPM had the same trend as in the optimized capacity table. By sorting bad cows, we could save 2 ∼ 12 utes. Depending on the bad cow amount, implement manually milk saved 14 ∼ 26 min-utes with enabled TPM and 20 ∼ 39 minmin-utes with disabled TPM. Analyzing by an ex-ample, OHM (average 30 bad cows in 550 cows) saved 25 minutes by enabling TPM and maximum 39.2 minutes by manually milking in 6 seconds. Because of the amount of bad cow in LAP, enable TPM could save 72 minutes and disable TPM could save over 100 minutes. Considering the actual condition, the bad cow classification rule in LAP needed to be modified from satisfying ACA2 operation time larger than 40 seconds to both meet the ACA2 rule and within 5% of herd scale.

Table 6.4: The session saved time of different level optimization

6.4 Robot timeout

(42)

34 CHAPTER 6. DISCUSSION

(43)

Chapter 7

Conclusion and future work

7.1 Conclusion

The bad cow defined by the single variable is used to classify the herd. In this case, the optimization is maximized by only considering the direct influence factor (ACA2 dura-tion) on slowest robot duration. Multi-variable bad cow definition provides a more com-prehensive cow behavior assessment for the farm. Due to the large proportion and in-direct affect on capacity, the multi-variable definition is not suitable to classify bad cow. According to the experiment, the idea proportion for bad cows should account around 5% ∼ 10of the herd size. A larger proportion of bad cows will disturb milking sequence and aggravate labor. Sorting eliminates the bad cows influence on good ones. However, bad cow still need to be milked automatically. The experimental results in different farms show a very limited gain on capacity. In this case, manually milk bad cow with different optimization levels is proposed. When enable TPM during manual milk, the capacity in-creased 3 ∼ 6 cows/hour and 14 ∼ 30 minutes are saved per session. Even more, the capacity increases 4 ∼ 11 cows/hour and saved time per session reaches 20 ∼ 39 minutes with disabled TPM. For robot timeout analysis, setting the maximum operating threshold for ACA2 saves a relatively small amount of time, it could be applied as an associated method and cooperated with bad cow manual milking method to further optimize the capacity. Because the unexpected ACA2 duration in good cow sequence will be limited.

The fundamental condition to achieve the optimization result above is based on the assumption that all the bad cows are classified correctly. Put farm’s implementation feasi-bility into consideration, the cow’s behavior prediction in next month according to cow’s behavior in the current month is applied. After building the data set for training, vali-dation, and testing, three machine learning algorithms (DT, SVM and ELM) are imple-mented and optimized to obtain the best prediction result. According to current work, the bad cow classification result on machine learning algorithms are better than the con-trol group (define next month behavior by using current month behavior directly). How-ever, machine learning algorithms have a poor performance at predicting good cows which limit the overall performance below than control group.

(44)

36 CHAPTER 7. CONCLUSION AND FUTURE WORK

7.2 Future work

The AMRTM system simulation proved the optimizing ability to classify bad cows, sort-ing and manually milksort-ing. Future work based on simulation has two main directions: collecting new milking data and test the simulation procedure in real farms. Since hu-man involved procedures (hu-manually attach milk cups and set cows which had bad robot performance to manual milk) affect the authenticity of bad cow classification, new milk-ing data should be collected with less human factors; by implementmilk-ing different opti-mization levels in actual farms, the feedback on actual capacity, cow’s behavior, and po-tential problems could be utilized to improve the simulation system.

(45)

Bibliography

[1] Delaval. Delaval company information. URL http://www.delavalcorporate. com/DeLaval-company-about/.

[2] DeLaval. Delaval amrTM _{2.0 instruction book. 2015-11-05.}

[3] Y. Dodge. The oxford dictionary of statistical terms. 2006.

[4] Phil Simon. Too big to ignore: The business case for big data.p.89. Wiley, 2015. [5] G. Piatetsky-shapiro U. Fayyad and P. Smyth. From data mining to knowledge

dis-covery in database. vol. 17, no. 3, pp. 37–54, 1996.

[6] C. Lassification. Machine learning methods for spam e-mail. J. Comput. Sci., vol. 3, no. 1, pp. 173–184, 2011.

[7] I. Tsamardinos D. Hardin A. Statnikov, C. F. Aliferis and S. Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, vol. 21, no. 5, pp. 631–643, 2005.

[8] G. Management L. M. Rudner and A. Council. Scoring and classifying examinees using measurement decision theory. Pract. Assessment, Res. Eval., vol. 14, no. 8, pp. 1–12, 2009.

[9] L. Rokach and O. Maimon. Classification trees. Data Mining Knowledge Discover Hand-book, pp. 149–174, 2010.

[10] M. Steinbach P.-N. Tan and V. Kumar. Classification: Basic concepts, decision trees, and model evaluation. Introducing to Data Mining, vol. 67, no. 17, pp. 145–205, 2006. [11] Olshen RA Stone CJ Breiman L, Friedman JH. Classification and regression trees.

CRC Press, 1984.

[12] J. R. Quinlan. Induction of decision trees. Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.

[13] Quinlan JR. C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann, 1993. [14] H. Kim and W.-Y. Loh. Classification trees with unbiased multiway splits. Journal of

the American Statistical Association, vol. 96, no. 454, pp. 589–604, 2001.

[15] W. Y. Loh. Improving the precision of classification trees. Annals of Applied Statistics, vol. 3, no. 4, pp. 1710–1737, 2009.

(46)

38 BIBLIOGRAPHY

[16] Roman Timofeev. Classification and regression trees (cart) theory and applications. master thesis, Humboldt University Berlin, 2004.

[17] Adnan Aijaz, Mischa Dohler, A Hamid Aghvami, Vasilis Friderikos, and Magnus Frodigh. Realizing the tactile internet: Haptic communications over next generation 5g cellular networks. arXiv preprint arXiv:1510.02826, 2015.

[18] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, vol.20, no.3, pp. 273–297, 1995.

[19] I. M. Guyon B. E. Boser and V. N. Vapnik. A training algorithm for optimal mar-gin classifiers. Proceedings of the fifth annual workshop on Computational learning theory, pp.144–152, 1992.

[20] G. Bin Huang and H. A. Babri. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Trans. Neural Networks, vol. 9, no. 1, pp. 224–229, 1998.

[21] R. Araújo T. Matias, F. Souza and C. H. Antunes. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Networks, vol. 17, no. 4, pp. 879–892, 2006.

[22] A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 930–945, Mar, 1993.

[23] no. 5 pp. 1131–1148 IEEE Trans. Neural Networks, vol. 8. Objective functions for training new hidden units in constructive neural networks. IEEE Trans. Neural Net-works, vol. 8, no. 5, pp. 1131–1148, 1997.

[24] R. Meir and V. E. Maiorov. On the optimality of neural-network approximation using incremental algorithms. IEEE Trans. Neural Netw., vol. 11, no. 2, pp. 323–337, Mar. 2000.

[25] Q. Zhu G.-B. Huang and C. Siew. Extreme learning machine: Theory and applica-tions. Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006.

[26] G. Huang. What are extreme learning machines? filling the gap between frank rosenblatt ’ s dream and john von neumann ’ s puzzle. Cognit. Comput., vol. 7, no. 3, pp. 263–278, 2015.

[27] pp. 428–436 Neurocomputing, vol. 129. Learning of a single-hidden layer feedfor-ward neural network using an optimized extreme learning machine. 2014.

[28] LibSVM. Libsvm: a library for support vector machines, 2001. software available at. URL http://www.csie.ntu.edu.tw/~cjlin/libsvm.

(47)

Appendix A

Simulation time and capacity

(48)

(49)

APPENDIX A. SIMULATION TIME AND CAPACITY 41 T able A.2: Simulation time (Disable TPM) Farm Info Number of Session Original Duration (Hour)

(50)

(51)

(52)

Appendix B

ACA2 Robot Timeout Analysis

(53)

APPENDIX B. ACA2 ROBOT TIMEOUT ANALYSIS 45

Table B.1: ACA2 Robot Timeout Analysis

OHM 0301-0401 Total data : 22148

Aca2 Duration (Sec) Successful rate (%) Success / Total Saved Times (H)

>90 2 3/150 0.14 >80 7.66 21/274 0.714 >70 16.29 58/356 1.59 >60 23.15 97/419 2.65 >50 23.35 199/852 4.23 >40 43.34 615/1419 7.23 <= 40 98.05 20325/20729 OHM 0401-0501 Total data : 18619

Aca2 Duration (Sec) Successful rate (%) Success / Total Saved Times (H) >90 0.78 1/128 0.117 >80 9.45 21/222 0.6 >70 17.9 53/296 1.32 >60 26.83 99/369 2.22 >50 24.36 202/829 3.67 >40 44.11 592/1342 6.54 <= 40 98.4 17008/17277 OHM 0501-0601 Total data : 18243

Aca2 Duration (Sec) Successful rate (%) Success / Total Saved Times (H) >90 1.48 2/135 0.12 >80 10.6 26/244 0.637 >70 29.8 62/331 1.45 >60 27.5 114/414 2.48 >50 22.9 218/951 4.15 >40 42.4 662/1561 7.47 <= 40 98.4 16415/16682 OTT 0301-0401 Total data :10508

>90 0% 10358 0 >80 5.12 17/332 0.313 >70 25.12 148/589 1.58 >60 43.3 352/813 3.46 >50 61.7 757/1227 6.18 >40 67.6 1649/2438 10.82 <= 40 98.9 7981/8070

OTT 0401-0501 Total data :11084

>90 0% 10978 0 >80 5.12 8/189 0.189 >70 25.12 77/346 0.9125 >60 37.8 169/447 1.98 >50 54.74 346/632 3.43 >40 58.19 810/1392 5.93 <= 40 99.6 9653/9692

OTT 0501-0601 Total data :10912

>90 0% 14531 0 >80 3.05 5/164 0.156 >70 8.06 59/326 0.806 >60 34.2 144/421 1.83 >50 50.9 298/586 3.2 >40 52.8 707/1339 5.56 <= 40 98.6 9438/9573

GAL 0301-0401 Total data :25307

Aca2 Duration (Sec) Successful rate (%) Success / Total Saved Times (H) >90 0.77 1/130 0.1 >80 10.84 23/295 0.69 >70 20.76 87/419 1.68 >60 27.6 174/630 3.05 >50 47.44 418/881 5.07 >40 67 1079/1543 8.24 <= 40 99.95 23754/23764 GAL0401-0501 Total data :24350

(54)

TRITA -EE 2016:205 ISSN 1653-5146

Knowledge discovery and machine learning for capacity optimization of Automatic Milking Rotary System

Knowledge discovery and machine

learning for capacity optimization

of Automatic Milking Rotary

System

TIAN XIE

Knowledge discovery and machine

learning for capacity optimization of

Automatic Milking Rotary System

Abstract

Sammanfattning

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Background

1.2

AMR

Capacity between ideal and reality

1.3

Purpose

1.4

Goals

1.5

Methodology

1.6

Outline

Chapter 2

Theoretic Background

2.1

AMR

general description

2.2

Database

Chapter 3

Methods

3.1

Data analysis

3.2

Machine learning classification

3.3

AMR

system simulation

Chapter 4

AMR

TM

optimization system design

4.1

KDD process

4.2

Bad cow definition

4.3

Robot timeout analysis

4.4

Machine learning classification

4.5

AMR

system Simulation

Chapter 5

Result

5.1

Bad cow definition

5.2

Machine learning prediction

5.3

Robot Timeout

5.4

AMR

system simulation

Chapter 6

Discussion

6.1

Bad cow selection

6.2

Machine learning classification

6.3

Comparison on different optimizing levels

6.4

Robot timeout