Can methods of machine learning be used to better predict lactation curves for bovines?

(1)

Can methods of machine learning be used to better

predict lactation curves for bovines?

Andreas ¨

Ostling

Supervisor: M˚

ans Thulin

Department of Statistics

Uppsala University

(2)

Abstract

A random forest is compared to an OLS model for predicting lactation curves for cows. Both of the methods have been estimated and tested using data from the period 2015-01 to 2015-09. Random forests outperform OLS in testing for modeling lactation curves with a decrease in MSE by approximately 26%. Data is provided by Sveriges Lantrbruksuniversitet and includes 75 558 milking events from 320 cows. The date of the milking, the time of day when the milking occurred as well as which cow was milked were found to be important variables for accurate predictions.

(3)

1 Introduction

Agriculture has always had a close relationship with statistical analysis, whether it is to produce maximum amount of wheat on a field or finding correct treatments for sick animals. This thesis aims to better understand and predict lactation curves for cows. How much milk a cow produces depend on many factors such as feed, time of year and how long since the cow had a calf. With both profit and animal health in mind it is useful to be able to predict these curves to maximize utility, even out production and provide better care for the animals.

Previous methods have used mixed linear models to estimate the relationships between milk production and available variables. The aim of this thesis is to answer the question: Can methods of machine learning be used to better predict lactations curves for bovines?

The field of machine learning is vast and random forests were chosen as the tool to answer this question. There are many advantages of random forests, the most prominent ones is that it is easy to implement and it is also much easier to understand what the method does in each step compared to other ”black-box”-methods such as neural networks.

2 Data

Data is 75 558 milking events spread over 9 months, starting jan 1 2015 and ending oct 1 2015. For each event several variables including but not limited to start time, animal id number, duration of milking event, amount of milk and milk flow were recorded. Not all of these variables are relevant for for the particular application of this thesis but they can be interesting for other applications. Errors have also been recorded for each event. Events where an error relating to milk yield have occurred is treated as missing value data and is excluded from analysis. Data has been split in training data and testing data, the first 7 months are used as training data and the last 2 as testing data. Data is provided by the Swedish University of Agricultural Sciences (Sveriges Lantbruksuniversitet).

(5)

3 Method

3.1 OLS Model

Lilja and Keteris Eckerstedt 2016, p. 10 describe the variables and the most common used model for lactation curves as follows:

yij = α1(Datei) + α2(M ilkingi) + β1∗ DIMi+ β2∗ DIMi2+ β3∗ DIMi3+ β4∗

1 DIMi

+ + β5(M ilkingi) ∗ DIMi+ β6(M ilkingi) ∗ DIMi2+ β7(M ilkingi) ∗ DIMi2+

+ β9(M ilkingi) ∗

1 DIMi

+ α(Cowi) + i (3.1.1)

Yij: Observed milk yield (kg)

Cowi: Identification number for the specific cow

Datei: Date of milking

DIMi: Days in milk (Number of days since calving)

M ilkingi: Classification of milking according to time of day (AM or PM)

i: Residual (kg) i = Observed cow j =    

left front teat right front teat

left rear teat right rear teat



  

”The variable days in milk (DIM) is defined as the number of days since latest calving. DIM is not included in the original data but is calculated as the difference in days between most recent calving date and the date for the milking event. DIM follows a characteristic curve depending on time since calving, which is why different functions of this variable also are included in the model. Milking is a fixed effect which allows for different levels for milk yield as well as allowing for different slopes depending on time of day when the milking procedure occurred. The variable date is included to control for day-specific occurrences. The effect of the specific cow is modelled as a random effect.”

3.2 Decision Tree

A decision tree is a series of nodes, each asking a yes or no question about one of the features in the dataset. If the answer is yes then one branch is followed, if the answer is no then the other branch is followed. This is repeated until the end point is reached. The end point is called a leaf. An example of this is found in figure 1. Here a company wants to know if

(6)

Figure 1: Example of a decision tree. Is a customer likely to buy a computer from a company or not?

a person in their store is likely to buy a computer. The first variable the tree looks at is age. If the person is between 18 and 25 then it is likely that they will buy the computer. If the person is not between these ages we keep asking questions until a leaf node is reached. Decision trees are a very good off-the-shelf procedure for data mining. With relatively fast computation times and possibilities for interpretation (as long as the trees are small). Trees also incorporate both numerical and categorical predictor variables or features seamlessly and handle missing values very well. Solutions are also invariant under strictly monotone (increasing or decreasing) transformations which is a very useful property for estimators of any kind. Many other machine learning methods such as nearest-neighbor-methods or Support Vector Machines do not have this invariance property. Linear regression is the most common method of approximating relationships in data and is only invariant under linear transformations. Feature selection is also part of the procedure to build a tree which makes them highly insensitive to irrelevant features/variables.(Hastie, Tibshirani, and Friedman

2009, p.352)

All of these properties together is what makes decision trees such a great learner. There is one downside to decision trees, that is that they are individually not very accurate, they have high variance. This can be remedied by both boosting as well as bootstrapping. Bootstrapping will be explained in more detail later in chapter 3.5.

3.3 Regression Tree

The next step once we understand the basics of a decision tree is to move on to regression trees. Regression trees are used to model a relationship between input variables and a numerical outcome variable or in other words to create a regression function. Once a leaf is reached a model is fitted and once all of the leafs are computed then the tree is complete and is then used to predict values on the outcome variable of new observations. There is an option of what model to fit in each leaf, one could imagine either a linear or quadratic model. The most common way however is to calculate ck from equation 3.4.1 which is to

take the average of all observations in both of the new regions after a splits. This most common method is visualized in figure 2: In the left figure a simple tree (commonly known

(7)

Figure 2: Example of a regression tree with depth 1 (left) and depth 2 (right)

as stump) is fitted to some training data. We see that the split is made somewhere between 0.8 and 1.0. A constant is fitted in each region and the combination of the left and right regions are now our estimate for the regression function. The interpretation of this is that if the value on the x-axis of a new observation is to the left of the split then the predicted value on the y-axis would be somewhere around −45. The second image shows a tree where another split has been made, the first split is the same point and then each of those regions are in turn split into two regions. Now our regression model is four horizontal lines. This tree to the right looks like a much better estimation of what data looks like.

3.4 Growing a Regression Tree

The following description is based on chapter 9.2.9 in Elements of Statistical Learning (Hastie, Tibshirani, and Friedman 2009)

A regression tree from data with p inputs and a response, for each of N observations is grown in the following manner:

(xi, yi), i = 1, 2, . . . , N and xi = (xi1, xi2, . . . xip, )

The goal is to split data into K regions and in each region fit a constant to the observations in that region. The fitted function would then be:

f (x) =

K

X

k=1

ckI(x ∈ Rk) (3.4.1)

where I(x) is the Identity function which is equal to 1 when x is within the region Rk and

zero otherwise, R1, R2. . . , Rkare the regions and ckis the minimization of the sum of squares

P(yi− f (xi))2 in region Rk. This means that the estimator for this region would be

ˆ

(8)

which is the mean of y within the region RK.

To find the best subsets R1, R2. . . , Rk all at once is very computationally intensive,

border-line infeasible. The best we can do is adopt a greedy method to partition data. This is done by splitting one node at a time and grow the tree that way. Starting with a splitting variable j and split point s, we define half-planes

R1(j, s) = {X|Xj ≤ s} and R2(j, s) = {X|Xj > s} (3.4.3)

and then seek the splitting variable j and split point s that solves

min j,s  min c1 X x1∈R1(j,s) (yi− c1)2+ min c2 X x1∈R2(j,s) (yi− c2)2  . (3.4.4)

For all choices j and s, the inner minimization is solved by ˆ

c1 = average(yi|xi ∈ R1(j, s)) and ˆc2 = average(yi|xi ∈ R2(j, s)). (3.4.5)

In other words we are doing the split which minimizes the sum of the squared errors for a constant fit in each region. This is computed for every variable and every point xi until the

best pair (j, s) is found. Each split is relatively fast to compute.

This process is then performed iteratively for each node until the entire tree is built. The process continues working until it reaches some stopping criteria. The default value for the commonly used R package randomForest(Cutler and Wiener2015) is to keep going until the smallest new region after a split is no smaller than 5 observations. There are other methods for stopping but they will not be discussed here.

3.5 Bootstrap Aggregation (Bagging)

Before introducing the concept of bagging we first have to learn about bootstrapping. If we start out with a training set Z = (z1, z2, . . . , zN) where zi = (xi, yi) and fit a model to

this set. We now want to find the accuracy of the model fit. Bootstrapping is a method to achieve this when there is no out-of-sample data to test on. The bootstrapping process is to randomly sample observations with replacement from the training set Z and then refit the proposed model to each of these samples to create a confidence interval for the model using the bootstrap sampling distribution. The most common way is to sample as many observations as one had in the original dataset. It is also possible to sample fewer (undersampling) and more (oversampling). Once we have our bootstrap samples we want to reduce the variance of our estimator as much as possible. The average of B independent identically distributed random variables, each with variance σ2_{, has variance}

1 Bσ

2

. (3.5.1)

If the variables are identically distributed, but not necessarily independent then the variance is

(9)

ρσ2+1 − ρ

B σ

2

. (3.5.2)

This expression simplifies to

ρσ2 (3.5.3)

when B grows large. The variance will in other words not approach zero asymptotically with larger sample size as long as the bootstrap samples are correlated. A rectification to this problem follows in the next chapter where we discuss random forests, a cousin of bootstrap samples where the correlation between the samples is minimized using randomness.

3.6 Random Forest

The idea behind random forest regression is to make many uncorrelated regression trees and average them. First off, the random forest algorithm(Hastie, Tibshirani, and Friedman2009, p. 588) is show in algorithm 1:

Algorithm 1 Random Forest for Regression 1. For b=1 to B:

(a) Draw a bootstrap sample Z of size N from the training data.

(b) Grow a random-forest tree Tb to the bootstrapped data, by recursively repeating

the following steps for each terminal node of the tree, until the minimum node size nmin is reached.

i. Select m variables at random from the p variables. ii. Pick the best variable/split-point among the m. iii. Split the node into two daughter nodes.

2. Output the ensemble of trees {Tb}B₁.

To make a prediction at a new point x: ˆ fB rf = 1 B PB b=1Tb(x)

This algorithm tries to minimize correlation by not only sampling observations (Bootstrap sampling) but also sample which variables should be included for consideration at each split of a node, known as feature sampling. The default value for random forest regression is to choose m = bp/3c. Using a smaller value for m will reduce the correlation between trees and thus by equation 3.5.2 reducing the variance of the average (Hastie, Tibshirani, and Friedman 2009, p. 589).

(10)

3.6.1 Effect of Tree depth

Tree depth is the number of levels or splits before a leaf is reached. As the depth of the tree increases, the function described in equation 3.4.1 will become more and more smooth since each of the k regions will be thinner and thinner. This leads to a closer fit to the data for each tree. Too shallow trees will have problems with underfitting, the tree will not be able to capture the variation in data if only a few points are fitted. Too deep trees will have the opposite problem, called overfitting. An overfitted tree will not perform as well on the test data as it did on the training data. The default value in the randomForest package is to allow splits where the resulting nodes i greater than 5 as well as no limit on maximum number of nodes. Both of these parameters can be tuned for higher accuracy or lower computation time.

3.6.2 Effect of Forest size

Increasing the forest size is to average more and more trees together for a lower variance estimator. Since adding more trees will not affect bias but reduce variance one can not average too many trees. More trees do however increase computation time and the added benefit of calculating a larger number of trees diminishes with forest size. It is useful to look at the OOB error estimate and to not add additional trees once the error stabilizes.

4 Results

4.1 Descriptive Statistics

The average milk yield per day is plotted in figure 3 (a)-(d), the lactations curves for each of the 4 teats of cows looks fairly similar, the rear teats seem to release more milk than the front ones but this is the only obvious difference. The lactations curves are very steep in the beginning, which is reasonable to be able to provide the calf with milk. The curve continues to climb until around the 50 day mark where it then slowly declines. After around 400 days all of the graphs (a)-(d) show some erratic behaviors, this can be explained when looking at graph (e). There are not many milking events past this point which would make the average less stable than before, causing the sudden jumps in the curve.

4.2 Variable evaluation

In figure 4 we see graphs visualizing variable importance for the random forest regression model. The graphs showing %incMSE is the percent increased Mean Squared Error when the variable in question is not included when constructing a forest. The error is calculated as the Out-of-bag error for the ensemble of trees where the variable is not included. Increased Node Purity is a measure of how much worse a split would be if the variable in question is

(11)

(a) (b)

(c) (d)

(e)

Figure 3: Milk yield of each teat as well as distribution of days in milk.

excluded. Node purity for regression is measured in Residual Sum of Squares (RSS). These measures are similar but generally %incMSE is the preferred measure to use since it is easier to interpret. Increased node purity is what the method uses for splits.

The most important variables across all teats is Milking (if the cow was milked in the morning or afternoon), Date (date of the milking event) and Cow (Animal Number). Table 1 displays the values for each of these variables. The table for Increased Node purity can be found in Appendix I.

Figure 5 displays the MSE for the forest with increasing number of trees. The graphs both display that the error seem to reach a stable level asymptotically and that somewhere around 100-200 trees might be enough of a forest size to still achieve maximum accuracy while not spending unnecessary computational power in calculating more trees. If the graph has not yet leveled out when reaching the far right of the x-axis then it is advisable to increase the number of trees when calculating the forest.

(12)

(a) a (b) b

(c) c (d) d

Figure 4: Percent decrease in MSE and Node purity when a variable in included for consid-eration in a tree.

4.3 Model evalutation

As we can see in both table 2 and table 3 the random forest method is outperforming the standard OLS method by on average 26% decrease MSE and 14% RMSE. Both of these values seem to be a quite substantial decrease in model error. The values for MSE and RMSE for OLS model is performed by Lilja and Keteris Eckerstedt 2016. Further worth noting is that the OLS model was fitted on months 4-9 and tested on months 10-11. The random forest regression model was trained on months 1-7 and tested on months 8-9. The data for months 10-11 is unfortunately lost at the time of writing this thesis and a more direct comparison can therefor not be made.

(13)

Table 1: Percent decreased Mean Squared Error when a variable is included in analysis for each variable and each teat.

Left Forward Right Forward Left Rear Right Rear

Milking 299.64 389.16 398.20 428.49 Date 277.87 270.47 221.94 259.64 Cow 254.27 256.13 220.19 253.87 1/DIM 42.59 37.84 37.21 36.02 DIM2 41.05 37.53 38.50 34.59 DIM 40.69 37.15 39.43 35.45 DIM3 40.14 36.33 38.34 32.33 (a) a (b) b (c) c (d) d

Figure 5: Number of trees versus the absolute error of the forest Table 2: MSE for randomForest regression versus OLS model

LF RF LR RR Average

randomForest 1.192044 1.104153 1.803447 1.648640 1.437071

OLS 1.550466 1.517615 2.336136 2.424136 1.957088

Absolute Difference -0.358422 -0.413462 -0.532689 -0.775496 -0.520017

Percent Decrease 23.12% 27.24% 22.80% 31.99% 26.29%

Table 3: RMSE for randomForest regression versus OLS model

LF RF LR RR Average

randomForest 1.091808 1.050787 1.342925 1.283994 1.192379

OLS 1.245177 1.231915 1.527966 1.555905 1.390241

Absolute Difference -0.153369 -0.181128 -0.185041 -0.271911 -0.197862

(14)

5 Conclusions

Random forest seem to outperform OLS when it comes to predict lactations curves. The average decrease in MSE when using random forests over OLS is 26.3%. The data used to estimate the OLS and random forest models is not exactly the same data but a large portion of the data does overlap.

The most important variables for determining milk yield is whether the cow was milked dur-ing the morndur-ing or afternoon, the date of the milkdur-ing(likely explained by seasonal variation or other day-to-day variations) as well as which cow was being milked. Removing either of these from analysis would increase MSE by over 200%.

Default values were used when performing random forest analysis, in future work a good starting point for tuning parameters would be to use B = 100 to B = 200. Starting with the smaller value in interest of computational time and then increase the number of estimated trees if the OOB error does not seem stable. More variables could be included for testing, since random forest has power variable selection feature built in, testing seemingly random variables should not have a noticeable effect on prediction performance.

For a more direct comparison to Lilja and Keteris Eckerstedt 2016the OLS regression could be rerun on the same dataset as the random forest was run to gain more comparable results. Finally, Bayesian methods could be used for tuning the parameters depth, number of trees and number of features for random forests to make the process more automated.

(15)

References

Cutler, Fortran original by Leo Breiman and Adele and R. port by Andy Liaw and Matthew Wiener (2015). randomForest: Breiman and Cutler’s Random Forests for Classification

and Regression. url: https://cran.r- project.org/web/packages/randomForest/

index.html.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2009). The Elements of Statistical Learning. Springer Series in Statistics. DOI: 10.1007/978-0-387-84858-7. New York, NY: Springer New York. isbn: 978-0-387-84857-0 978-0-387-84858-7. url: http : / / link .

springer.com/10.1007/978-0-387-84858-7.

Lilja, Mathias and Ilse Keteris Eckerstedt (2016). The influence of teat wash failure on milk yield in dairy cows. eng. url: http://www.diva-portal.org/smash/record.jsf?pid=

(16)

6 Appendix I: Tables

Table 4: Variable importance for each teat.

Left Forward Right Forward Left Rear Right Rear

IncNodePurity IncNodePurity IncNodePurity IncNodePurity

Milking 8461,28 9333,18 17910,72 17689,89 Date 14930,69 15001,80 25491,96 25298,57 Cow 32481,43 31605,40 54801,64 50121,69 1/DIM 8469,49 9026,62 10285,82 15199,59 DIM2 7330,99 8596,52 12253,45 12114,23 DIM 6900,88 7915,62 13049,42 12641,00 DIM3 7285,31 9187,04 12789,96 12109,29

7 Appendix II: R Code

# install.packages(’plyr’) library("plyr")

set.seed(1111)

# Data manipulation Importing two different # datasets, one with variables for the milking # process and the other one with background # variables

ARM01 <- read.csv("AllaMjAMR201501.csv", skip = 1, sep = ";", dec = ",", stringsAsFactors = FALSE, na.strings = c("", "NA"))

lakt <- read.csv("All_laktations.csv", sep = ",", stringsAsFactors = FALSE)

# Converting factors to date (Hours:minutes # included to differentiate between different # events on the same day)

ARM01$StarttidTime <- as.POSIXct(as.character(ARM01$Starttid), format = "%Y-%m-%d %H:%M")

lakt$Calving.Date <- as.Date.character(lakt$Calving.Date) ARM01$Datum <- as.Date.character(ARM01$Starttid)

# Merging the datasets by Aninmal ID

merged.data <- merge(ARM01, lakt, by.x = "Djurnr", by.y = "Animal.Id", all.x = TRUE)

(17)

# Fixing problem with several row for the same # milking event depending on the number of calves # for the specific cow

merged.data <- subset(merged.data, Datum > Calving.Date) # Functiion for keeping the maxiumum value

merged.data <- ddply(merged.data, .(Djurnr, StarttidTime), function(d) d[which.max(d$Calving.Date), ])

# Create a new variable (Days in milk)

merged.data$DIM <- as.numeric(merged.data$Datum - merged.data$Calving.Date) # Skapar sedan funktioner av denna

merged.data$DIM2 <- (merged.data$DIM)^2 merged.data$DIM3 <- (merged.data$DIM)^3 merged.data$DIMdel <- 1/(merged.data$DIM)

# Creating dummy for milking AM or PM,(AM=TRUE, # PM=FALSE)

merged.data$Hour <- format(merged.data$StarttidTime, "%H")

merged.data$Hour <- as.numeric(merged.data$Hour) merged.data$MilkingAm <- merged.data$Hour < 12 # Merging into final dataset

dok01 <- merged.data

# Creating the columns for the dummy teatwash dok01$dummyVF <- 0

dok01$dummyHF <- 0 dok01$dummyVB <- 0 dok01$dummyHB <- 0

dok01 <- subset(dok01, !(is.na(VF.4) | is.na(HF.4) | is.na(VB.4) | is.na(HB.4)))

#Once data manipulation for all monts is complete run following #code to create training and testing datasets

# Same name in all datasets names = colnames(dok04) colnames(dok01) = names names = colnames(dok04) colnames(dok02) = names names = colnames(dok04) colnames(dok03) = names names = colnames(dok04)

(18)

colnames(dok05) = names names = colnames(dok04) colnames(dok06) = names names = colnames(dok04) colnames(dok07) = names names = colnames(dok04) colnames(dok08) = names names = colnames(dok04) colnames(dok09) = names

# Combing the different months into one detach(package:plyr)

library(dplyr)

train <- rbind(dok01, dok02, dok03, dok04, dok05, dok06, dok07)

# Excluding all observations where something # (except for teatwash) went bad

train <- train[is.na(train$Avspark), ] train <- train[is.na(train$Ofullstndig), ] train <- train[is.na(train$Spenar.hittas.ej), ] # Several errors for the same milking event are # possible

train <- train %>% arrange(Djurnr, Datum, MilkingAm) i <- 2

41

while (i <= dim(train)[1]) {

if (train$Djurnr[i] == train$Djurnr[i - 1] & train$Datum[i] == train$Datum[i - 1] & train$MilkingAm[i] ==

train$MilkingAm[i - 1]) { train$dummyVF[i - 1] <- max(train$dummyVF[i], train$dummyVF[i - 1]) train$dummyHF[i - 1] <- max(train$dummyHF[i], train$dummyHF[i - 1]) train$dummyVB[i - 1] <- max(train$dummyVB[i], train$dummyVB[i - 1]) train$dummyHB[i - 1] <- max(train$dummyHB[i], train$dummyHB[i - 1]) train <- train[-i, ] } else { i <- i + 1 } } train$Djurnr <- as.numeric(train$Djurnr)

(19)

test <- rbind(dok08, dok09)

# Excluding all observations where something # (except for teatwash) went bad

test <- test[is.na(test$Avspark), ] test <- test[is.na(test$Ofullstndig), ] test <- test[is.na(test$Spenar.hittas.ej), ] # Several errors for the same milking event are # possible

test <- test %>% arrange(Djurnr, Datum, MilkingAm) i <- 2

41

while (i <= dim(test)[1]) {

if (test$Djurnr[i] == test$Djurnr[i - 1] & test$Datum[i] == test$Datum[i 1] & test$MilkingAm[i] == test$MilkingAm[i -1]) { test$dummyVF[i - 1] <- max(test$dummyVF[i], test$dummyVF[i - 1]) test$dummyHF[i - 1] <- max(test$dummyHF[i], test$dummyHF[i - 1]) test$dummyVB[i - 1] <- max(test$dummyVB[i], test$dummyVB[i - 1]) test$dummyHB[i - 1] <- max(test$dummyHB[i], test$dummyHB[i - 1]) test <- test[-i, ] } else { i <- i + 1 } } test$Djurnr <- as.numeric(test$Djurnr)

############################## dataset test maipulation complete

#Once all data manipulation is complete, run following to perform #random forest regression as well as some descriptive statistics

# install.packages(’randomForest’) library(randomForest)

library(gplots) set.seed(1111)

(20)

# Run random forest regression on each of the # teats.

forestVF <- randomForest(VF.4 ~ Datum + MilkingAm + DIM + DIM2 + DIM3 + DIMdel + MilkingAm * DIM + MilkingAm * DIM2 + MilkingAm * DIM3 + MilkingAm * DIMdel + Djurnr, data = train, importance = TRUE, keep.forest = TRUE)

forestHF <- randomForest(HF.4 ~ Datum + MilkingAm + DIM + DIM2 + DIM3 + DIMdel + MilkingAm * DIM + MilkingAm * DIM2 + MilkingAm * DIM3 + MilkingAm * DIMdel + Djurnr, data = train, importance = TRUE, keep.forest = TRUE)

forestVB <- randomForest(VB.4 ~ Datum + MilkingAm + DIM + DIM2 + DIM3 + DIMdel + MilkingAm * DIM + MilkingAm * DIM2 + MilkingAm * DIM3 + MilkingAm * DIMdel + Djurnr, data = train, importance = TRUE, keep.forest = TRUE)

forestHB <- randomForest(HB.4 ~ Datum + MilkingAm + DIM + DIM2 + DIM3 + DIMdel + MilkingAm * DIM + MilkingAm * DIM2 + MilkingAm * DIM3 + MilkingAm * DIMdel + Djurnr, data = train, importance = TRUE, keep.forest = TRUE) # Number of trees plot(forestVF) plot(forestVB) plot(forestHF) plot(forestHB) # MSE:

mean((predict(forestVF, test) - test$VF.4)^2) mean((predict(forestVB, test) - test$VB.4)^2) mean((predict(forestHF, test) - test$HF.4)^2) mean((predict(forestHB, test) - test$HB.4)^2) # RMSE:

sqrt(mean((predict(forestVF, test) - test$VF.4)^2)) sqrt(mean((predict(forestVB, test) - test$VB.4)^2)) sqrt(mean((predict(forestHF, test) - test$HF.4)^2)) sqrt(mean((predict(forestHB, test) - test$HB.4)^2)) # importance

importance(forestVF) importance(forestHF) importance(forestVB)

(21)

importance(forestHB) # importance plots varImpPlot(forestVF) varImpPlot(forestHF) varImpPlot(forestVB) varImpPlot(forestHB)

# plots of all teats

plot(test[, "DIM"], test[, "VF.4"]) total = rbind(train, test)

# histogram of days in milk to see distribution

hist(total[, "DIM"], breaks = 20, xlab = "Days in milk", main = "Histogram of Days in milk")

# Plot the means for all 4 teats

plotmeans(total[, "VF.4"] ~ total[, "DIM"], xlab = "Days in milk", bars = FALSE, n.label = FALSE, ylim = c(0, 6),

col = "white", ccol = "blue", ylab = "Left Forward",

main = "Means of left forward teat milk yield per day after calving") plotmeans(total[, "HF.4"] ~ total[, "DIM"], bars = FALSE,

n.label = FALSE, ylim = c(0, 6), col = "white",

ccol = "blue", xlab = "Days in milk", ylab = "Right Forward",

main = "Means of right forward teat milk yield per day after calving") plotmeans(total[, "VB.4"] ~ total[, "DIM"], bars = FALSE,

ccol = "blue", xlab = "Days in milk", ylab = "Left Rear",

main = "Means of left rear teat milk yield per day after calving") plotmeans(total[, "HB.4"] ~ total[, "DIM"], bars = FALSE,

ccol = "blue", xlab = "Days in milk", ylab = "Right Rear", main = "Means of right rear milk yield per day after calving") # xlab and ylab didn’t work for above, print

# another one just to be able to copy lables to # paint.

plotmeans(HB.4 ~ DIM, data = total, ylim = c(0, 6), col = "white", ccol = "blue", xlab = "Days in milk",

ylab = "Mean Yield", main = "Means of right rear teat milk yield per day after calving") # End of code

Can methods of machine learning be used to better predict lactation curves for bovines?