Analysis on how to estimate the number of holes a drill rig has completed based on its activity

(1)

Analysis on how to estimate the number of holes a drill rig has completed

based on its activity

Elias Elfving

Computer Science and Engineering, bachelor's level 2021

(2)

Acknowledgements

I want to give out acknowledgments to those that have given me support during the period that I have been working on this thesis. Firstly I want to thank my supervisor Johan Carlson who has been an instrumen- tal support to this work actually taking oﬀ but also continuing smoothly.

Secondly I want to thank those at Mobilaris that made this project a pos- sibility but also giving me the chance to tackle it. Lastly I want to thank my two wonderful parents who have since day one supported me to the fullest in all my endeavours including this three year journey I have made in higher education. I also want to thank all of my wonderful friends that have made many of the tough times I have faced the past years much easier.

Once again I thank you all from the bottom of my heart!

(3)

Abstract

Industrial processes have for a long time become more and more au- tomated, this is no diﬀerent in the mining industry. When excavating during mining operations special drill rigs are used to drill holes in the rock walls to be used for either explosives or bolts to support the structure.

The study aimed to find out if it was possible to create an algorithm that would use the drill rigs telemetry data to estimate the number of holes it had created over specific time period. The main approach would be to see if machine learning could be used for the problem or if some other method could be theorised. Without the groundwork needed to create a proper machine learning algorithm a basic statistical approach was used to solve the problem, however since there were no actual reports containing the amount of holes a rig drilled the final solution is highly conjectural.

(4)

1 Introduction

The work presented in this report is a student thesis created for the Computer Science department at Lule˚a University of Technology. It was made in collabo- ration with the company Mobilaris which works with providing IT-solutions for mining operations.

1.1 Background

Mining is an old and hard process, the process itself can also work in a myriad of ways but one tried and true method is called drilling and blasting. It is a technique used in mining operations to excavate underground. The procedure starts by holes being drilled into the tunnel walls, these holes are then ﬁlled with explosives to be detonated. When detonated the rock collapses revealing a deeper part of the tunnel, the rubble is then transported out of the tunnel and reinforcements need to be put in place. This is done by drilling rock bolts into the walls to provide better structural integrity.

Industrial processes like mining have also long incorporated computers to help with production, it stands to reason that most large scale industrial machines have some form of computer within them. These vary in purpose, they can help with controlling the machine, some are for communication and there are those that collect data about what the machine is doing, for example reports of engine activity. This data can be very broad and large scale depending on use case, it could be used for debugging, manual analysis of data or used to overlook the the general activity of the machine. However, while the logs can be broad they still do not account for everything.

This is the case for a mining company involved with this thesis and the drill rigs used in their excavating process. These rigs are not able to count the total amount of holes they create and thus the machine operators must report this statistic manually themselves. With that in mind the logging of this statistic is prone to human error and may contain errors, therefore it may not reﬂect the real amount which is the main focus of this report.

1.2 Motivation

The main motivation for this work is of course to eliminate the human factor of logging the statistics. For example an operator may choose to report how many holes they have drilled at the end of their shift. This is prone to error since people forget easily and therefore, the actual reports may have small errors in them. This is why it would be helpful to remove the cloud of doubt regarding the data. Solving the problem could also lay the groundwork for more similar implementations being done in the future in other parts of the mining process.

(7)

1.3 Problem deﬁnition

The problem in question posed by Mobilaris can be summarised as follows;

”Would it be possible to implement a machine learning algorithm to analyse the time series data of the drills’ activity, the goal being to get a result of how many holes were drilled in a speciﬁc period of time”.

As a more general outline the report is not mainly concerned with machine learning but also a discussion of general statistical approaches that could be employed to solve the problem.

The available data comes from a time series database in which the machine logs its process, in which a lot diﬀerent metrics can be found. As a guideline given by the sponsor of the thesis the main metric to look at would be the drill activity and also the cooling pump activity.

In accordance with this some sub problems can be deﬁned to help the process of solving this problem:

• How to extract the data and present it in a meaningful way?

• Looking at the main metrics, what patterns could be found in when a hole has been drilled

• What method should be used to produce the result?

– Is statistical analysis suﬃcient?

– Is there a simple script or program capable of the work?

– Is machine learning really needed?

• If machine learning is needed;

– Is there enough ground work to provide a machine learning solution to analyse this type of time series problem?

– Is there enough time to actually produce such a solution?

The second bullet point is related to something that requires further discussion of the problem. If drill activity is the main metric that would form the patterns that explain when a hole is created drill inactivity would also be as important.

Considering this, the schedule of the workers will also need to be taken into account when analysing the data since it can provide large periods of inactivity.

The schedule could be theorised by analysing a graph of the machines activity but a general outline was given by the company. The day starts with a period of drilling at around 05:30. Lunch is variable but is usually 11:00-12:00 and a round of blasting begins close to 14:50 and lasts for about an hour. Then a shift change is done, they eat dinner around 19, they do the usual work until 00:30 where blasting happens again. The mine is then emptied until the morning

(8)

1.4 Research questions

• Is it possible to estimate the amount of holes drilled based on the data given?

• Is machine learning needed?

1.5 Delimitations

The report is only focusing on one mine and the data it has collected because it was the speciﬁc mining company who had approached Mobilaris with the idea for this project.

The log entries found in the accessed database has no real descriptions. This means their purpose had to be guessed.

The researcher is relatively inexperienced with machine learning and has mainly just participated in an online course given by Stanford.

The 10 weeks of time given for this project was not enough to create a machine learning solution. Especially since 3-4 weeks were used for writing the thesis.

(9)

2 Tools

In this section details on used tools are given.

2.1 MATLAB

MATLAB is a programming platform created for the use of engineers and scientists alike. The platform implements its own programming language which is matrix-based, this allows it to have a very mathematically natural expression format. The main functions of MATLAB are analyzing data, creating algorithms, models and applications.[1]

For this project the platform was used mostly to visualise data by plotting diﬀerent functions on a graph in a way to make it possible to analyse.

2.2 Python

Python is a high-level scripting language supporting object orientation and many other programming primitives. It is dynamically typed and very easily written omitting many usual language implementations like the use of curly braces, this makes it very easy to read compared to other choices. The language supports the use of modules and packages which makes it very extensible and lightweight.[2] The language has grown a lot in popularity over the years due to how fast it is to write but also the huge community support of packages aimed especially at data scientists. This has made it a go to choice for machine learning and statistical analysis which is the main factor of it being chosen for this project.

Aside from the standard library, three packages were used.

2.2.1 Numpy

Numpy is a package allowing the user to do scientiﬁc computing within Python.

Like MATLAB it allows Python to act like a matrix-based language.[3] The main use of it for this project was calculations on vectors, primarily histograms.

2.2.2 Matplotlib

The Matplotlib package’s goal is to create functions for developing graphs and plots of many diﬀerent data sets. For this project it was used to create plots of histograms, this task was originally done in MATLAB but this allows it to be more automatable which made it the given choice.

(10)

3 Theory

3.1 Machine Learning

Machine learning is a sub-field of studies related to artificial intelligence. The field has shown a significant growth of interest the past years but despite that, a clear cut definition has yet to be coined.[4] One definition given by the AI pioneer A.L Samuel in 1959 could be paraphrased as, ”Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed”.[5] While this definition is still very good it is a bit too broad for more detailed explanations. Therefore another was created in 1997 by researcher Tom Mitchell which states, ”A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” [6]

With this in mind one could describe it as a way to train algorithms to ﬁnd patterns in massive amounts of data and make diﬀerent kinds of predictions depending on that input. The predictions themselves will become more accurate and trustworthy the better the algorithm is.[7] The four main steps to doing machine learning goes as follows:

1. Select training set

The training set is the data chosen as input for the learning algorithm.

This set can come in many forms but in general it can be classiﬁed as two types; Labeled data sets and unlabeled data sets. Labeled data is data where the answer to the input is also known, which the machine uses to learn. This is often called supervised learning. Unlabeled data is missing the answer, therefore the machine does not have an answer sheet to learn from. This is usually called unsupervised learning. [7]

2. Choose learning algorithm

The algorithm chosen will depend on the application but will also mainly depend on what training data exists, the main diﬀerential being if it is labeled or not. For labeled data common algorithms are:

• Regression algorithms

• Classiﬁcation algorithms

• Decision trees

• Instance-based algorithms

On the opposite side, unlabeled data training algorithms commonly seen are:

• Clustering Algorithms

• Association Algorithms

(11)

3. Train algorithm to create the model

The training process is iterative, one step involving running the algorithm with some set biases and weights then compare the output with real data of what it should have predicted. This way an accuracy metric can be calculated, doing this with many diﬀerent weights and biases the model that is chosen is the one with the highest correctness.

4. Using and improving said model

The last step is simply to use the model chosen during the training step and let it improve over time.

3.1.1 Supervised vs Unsupervised machine learning

The two main approaches to machine learning can be put into two groups, supervised and unsupervised. The deciding factor of what method to use is the nature of the training set, in other words if it is unlabeled or labeled.[8]

For supervised approaches the data is labeled. The data sets are designed to supervise algorithms so they learn how to solve some problem, mainly classifying data or predicting its output. Classiﬁcation problems can be described by the data having a rigid structure and the goal is to separate data from each other. One example is teaching an algorithm how to correctly guess what a picture contains, an apple vs orange maybe. Regression problems are related to understanding the relationships between diﬀerent variables and what output they can create together. One such area would be sales predictions of houses dependent on things like size, location etc.[8]

Unsupervised learning approaches uses unlabeled data and this approach has three main tasks it can do: clustering, association and dimensionality reduction.

Clustering is a technique that groups data together based on similarities in certain features. Figure 1 shows colored data points sharing some unknown features and the clusters a clustering algorithm could have sorted them in to.

Clustering is often used for things like creating collections of news articles that cover the same topic.

(12)

Figure 1: Picture showing clustering with diﬀerent colored points

Association is the method that ﬁnds diﬀerent relationships between variables with a preset of rules, this method is often used for things like recommendation systems. Dimensionality reduction is a way to decrease the number of input variables or features a given data set has. The aim is to reduce the amount of inputs to a manageable size and often reduce it enough so that it can be displayed on a 2D or 3D graph.[8]

3.2 Drill rig information

The drill rigs that are used in the operating mine have two drill arms. Figure 2 shows an example of how these drill rigs look.

Figure 2: Picture of a drill rig

The drill arms have variable speed that they can use, the speed can range between 1.3 to 2.5 m/min. The holes that need to be drilled are usually 2.7 m deep. This means that on average one hole takes about 85 seconds, given

(13)

that average drill speed would be 1.9 m/min. The slow end would therefore be 125 seconds and the fast would be 65 seconds. Another average can be calculated by averaging the slow and fast end which results in 95 seconds per hole. Furthermore, it was said by people working at the mine that requested this work from the sponsor that these holes were drilled mostly without inter- ruption. Unfortunately there is not any more detailed information about the drills themselves, it is known however that they average around 140 holes per day.

The drill rig is also equipped with 2 separate cooling pumps that cool one drill arm respectively. Any detailed information about them is not known.

3.3 Data

The data that was given by the sponsor of this thesis comes from an Inﬂux database. InﬂuxDB is software that is used to create time series databases which means data is stored with a regard to the time it is created, this results in time series databases often becoming very large with many gigabytes of data.

They are often used in applications where the user wants to be able to see a clear picture of how certain data or parameters have changed over time. Applications that deal with weather forecasting, measurements in industrial equipment etc, are great examples of where they can be used. How the data is structured can vary but for Inﬂux databases when retrieved the data returns in a json format.

3.3.1 JSON structure

The structure of the JSON file can be grouped into four main arrays of data ash shown in figure 11 within the appendices. First is the results array which contains all the data for the request made, then there is an array called series, this contains name and information of the specific data period and also the values. The columns array describes the different values found in each entry of the database, the actual entries are found in the values array.

In the databases used for the project there were a total of 34 series arrays each containing around 10 000 entries. Moreover each entry in the database was composed of 11 components;

” columns ” : [

” t i m e ” , ” c r e a t e t i m e ” , ” c y c l e t i m e ” , ” machine name ” ,

” m a c h i n e node ” , ” t i t l e ” , ” t y p e ” , ” u n i t ” , ” v a l u e ” ,

” v a l u e t y p e ” , ” vendor ” ]

Of these only 5 were of interest, they were ”time”, ”machine name”, ”machine node”,

”title” and ”value”

(14)

3.3.2 Variable descriptions

• ”Time”

– The timestamp of when the event happened within the machine

• ”Machine name”

– The name of the machine, in this case the MAC address

• ”Machine node”

– A speciﬁc node in the machine, if a machine has two drill arms this variable will distinguish them

• ”Title”

– The name of the event that happened

• ”Value”

– Value of the measurement made by the machine 3.3.3 Data of interest

As per the guideline of the sponsor the data of interest would be the drill activity and also the pump activity of the rig. The entries of these are measured in seconds and their names respectively are ”EV DRILL STATE” and

”EV PUMP STATE”.

4 Method

4.1 Extraction of data

To start visualising the data, all of the relevant entries were extracted from one series, the entries were separated into individual objects identified by the first 8 numbers of each machines MAC-address. Afterwards several time series matrices are created with 5 columns, time in seconds (x), drill 1 (y0), drill 2 (y1), cooling pump 1 (y2) and cooling pump 2 (y3). The size of the matrix depends on the time-span when a machine is active, this length of time was calculated by taking the first entry of a machines log and using the timestamp for it as a starting point. The stop point was found by going through all entries and finding the largest end time for an event, this means the largest value of time created + duration of event. A matrix is then created for that specific machine with the dimensions (end time − start time) X 5. The matrices can therefore become very large since the time column is in seconds and the time- spans cover several weeks if not months. The activity of each machine node is represented as binary data within the matrix, 0 for off and 1 for on. These matrices serve 2 purposes, first is to make it possible to visualise the data in a meaningful way and also lays the ground work to make calculations on it. Table 1 is a randomized example matrix that spans 10 seconds.

(15)

Table 1: Number of holes based on average hole time lengths x (y0) (y1) (y2) (y3)

0 0 1 1 1

1 0 0 1 1

2 1 1 0 1

3 1 1 1 1

4 1 1 1 1

5 1 1 1 1

6 1 1 0 1

7 1 0 1 0

8 1 1 1 1

9 1 1 0 1

4.1.1 Visualisation

To visualise the data, the matrices created for each machine were loaded into MATLAB where the y values were plotted against the x vector in the matrix.

This was done to get a general picture of the activity of the drills and pumps so to better understand how they work with each other and if any apparent patterns are visible. In figure 3 a graph showing both drill and pump activity gives one important insight. It would seem as if the pump is always on during operating hours of the drill, this means that the pumps activity in essence is non-impactful to any pattern that would define when one hole has been finished.

The result of this is that the pumps activity can be completely omitted from analysis which was decided to be a good approach when discussing with the sponsor.

(16)

With the pump activity regarded as not important, the next important feature of the data is to see if there are any distinguishable patterns in the drills activity. Figure 4 displays the first 16 hours of one machine containing both drill arms activity over the period. The first feature of the graph that sticks out is how closely related the operation of both drill arms are, this was mostly consistent across the entire series and across the different machines. Graphs showing 1 series worth of data for all machines can be found in figures 12, 13 and 14 of the appendices.

Figure 4: Graph displaying both drill arms activity across 1 day

Whenever the drills are not working in sync is a discrepancy that could not be explained. In the end it was ultimately regarded as an expendable detail to the end goal of the solution, mainly because each arm would provide slightly diﬀerent results anyways. With this in mind going forward, the activity of one arm was the only thing analysed at a time as to simplify the process. Looking at only one drill whilst also looking at smaller periods of time to get a higher resolution of the activity it ended up being mostly what was expected. There were a lot of longer bursts of activity along the graph often grouped quite tightly together. Figure 5 is a segment of activity that displays this quite nicely.

Each burst of activity averages to be around 2 minutes which would coincide with information that is know about how the drills operate. It was therefore concluded that each of the peaks in the graph could be assumed to be result of one hole being drilled.

(17)

Figure 5: Segment displaying normal activity

The normality in ﬁgure 5 was not found across the entire series. However, sudden interruptions and smaller bursts of activity are present in certain ar- eas of the graph. A pattern was found in these disruptions, the small bursts often preceded or superseded longer bursts of activity that were closer to the proposed average. One conclusion made when looking at these segments is that the smaller segments could conﬁdently be grouped together with the longer period as one hole being made. Figure 6 is an example of this phenomenon, the disruptions are colored in, peaks with the same color are theorised to belong together. When analysing all the data for similar segments it was found that most small periods ended up being around the same length which is a fact used for the solution.

(18)

4.2 Algorithm

The algorithm created to get an estimate of how many holes one drill rig created over a span of time ended up not including any machine learning. The reason for this will be covered in detail in the discussion section. The solution is based on the fact that smaller bursts of activity could be grouped together with larger ones. The algorithm creates a histogram for all the active drill period lengths and then removes all values not conﬁned within speciﬁc bounds. The remaining size of the histogram is then the rough estimate of how many holes were made.

It works on the principle that all small periods of activity is grouped together with one larger. Instead of trying to combine smaller periods with larger ones which would require a more advanced implementation this is done because it should provide similar results since removing the excess fat should give the same amount of meat. Figure 7 is a generalised version of the algorithm.

BEGIN

1 . l o a d d a t a b a s e l o g

2 . e x t r a c t r e l e v a n t d a t a b a s e e n t r i e s : s e p a r a t e them by mac a d d r e s s and machine node

3 . c r e a t e time s e r i e s m a t r i x from e x t r a c t e d DB e n t r i e s

4 . c r e a t e h i s t o g r a m from m a t r i c e s

5 . s e t bounds o f a c c e p t e d v a l u e s ( l o w e r , upper ) 6 . remove a l l e n t r i e s not w i t h i n bounds

from h i s t o g r a m s c r e a t e d i n s t e p 4

7 . C a l c u l a t e s i z e o f each h i s t o g r a m 8 . R e s u l t s i n 7 a r e o r g a n i z e d and s a v e d

a s o u t p u t END

Figure 7: Generalised version of algorithm

4.2.1 Selection of bounds

The reliability of the algorithm is strictly dependent on what bounds are chosen when removing values in the histograms, especially the lower bound. To get an idea of a good lower bound the removal rate was analysed by calculating the amount of values removed with a lower bound of 1, 2, 3..., 200. Every value was entered into an array that could be plotted on a graph with the lower bound as x, this was done over the entire data set.

(19)

Figure 8: Removal rate when increasing the lower bound of the algorithm

Figure 8 displays the removal rate of the algorithm when increasing the lower bound. An apparent pattern is directly clear, the growth is extremely large from 0 to 20 and plateaus up until 60 to 100 depending on which machine is looked at. Looking back at the information given about how the drills operate this would seem to be inline, if most holes created should take between 65 to 125 seconds the speed of growth should increase in that area which it does. A good lower bound should therefore be somewhere around 60 which seems reasonable due to it being close fastest time a hole theoretically could take to be made. To get a broader result one could run the algorithm with a lower bound of 30-40 just to get an upper estimate.

4.3 Conﬁguration

To acquire results some considerations were made to what configurations going to be used. The choice of upper bound was the easiest, this variable was found to be mostly arbitrary so a value of 600 was chosen since no hole should take longer than 10 minutes. For the lower bound 4 different values were chosen, 0, 20, 30 and 60. The choice of 0 comes down to how it will provide a nice development curve to analyse. 20 was chosen due to it being a clear drop off point in the growth rate of figure 8. Then 30 and 60 were chosen as they were determined to give a reasonable upper and lower bound on the resulting output.

(20)

5 Results

5.1 Main results

The main results of the thesis came from running the theorised algorithm and conﬁgurations over the entire data set given by the sponsor. The entire data set came out to be about 210 days worth. Tables 2, 3, 4 and 5 show the results of each conﬁguration for all machines. The machines are denoted by M and D stands for each of their individual drill arms. The statistics are calculated by running methods across the resulting array created by removing all entries from the original histogram of the time series. The average length is the mean time per estimated hole, the median and standard deviation is also related to time per hole.

Table 2: Machine stats when lower bound equals 0

Machine M1D1 M1D2 M2D1 M2D2 M3D1 M3D2

Days 211 211 110 110 203 203

Number of holes 25053 25192 10905 10741 19561 18914

Holes per day 118 119 99 97 96 93

Average Length(sec) 92 94 102 103 101 103

Median(sec) 107 106 114 112 110 112

Standard dev.(s) 68 72 86 86 86 84

Days 211 211 110 110 203 203

Number of holes 17407 17167 7042 7067 12608 12759

Holes per day 82 81 64 64 62 63

Median(sec) 135 141 159 153 160 153

Standard dev.(sec) 44 49 58 61 60 60

Days 211 211 110 110 203 203

Number of holes 17031 16751 6898 6907 12238 12426

Holes per day 80 79 63 63 60 61

Median(s) 137 142 160 155 162 155

Standard dev.(sec) 42 46 55 58 57 57

(21)

Days 211 211 110 110 203 203

Number of holes 16166 15769 6533 6530 11527 11695

Holes per day 76 74 59 59 56 57

Median(s) 139 145 165 159 166 160

Standard dev.(sec) 37 41 49 53 51 51

Figure 9 shows the decreasing number of estimated holes when increasing the lower bound, the graph includes all the machines and their drill arms.

Figure 9: Graph showing the decrease in estimated value when lower bound increases

5.2 Other results

To get some more results to compare with calculations were made to get estimation by using a set drill speed and its total time of activity. Table 6 is a table showing the number of holes if those were calculated by taking the total active time of each drill arm and dividing it by a set variable. The variables are 60, 85, 95 and 125 which were the averages calculated in section 3.2.

(22)

Table 6: Number of holes based on average hole time lengths

65s 35 581 36 291 17 161 17 067 30 510 29 941 85s 27 209 27 752 13 123 13 051 23 332 22 896 95s 24 345 24 830 11 741 11 677 20 876 20 486 125s 18 502 18 871 8 924 8 875 15 865 15 569 LB = 60 16 166 15 769 6 533 6 530 1 1527 11 695

6 Discussion

6.1 Results

Here a discussion is made about the results provided in the previous section, mainly discussing the logic in them.

6.1.1 Estimation analysis

Looking at the results it is clearly visible that there is a large drop between 0 and 20 as lower bounds. This would lend credence to the theory that just removing smaller values from the histogram would give a more realistic estimate. Looking at table 5 it is also visible that the standard deviation is quite large, this makes the result not very trustworthy since it should be more normalised. Another factor to take into account is the average amount of holes per day each drill ﬁnishes, if the estimation is correct, then for a lower bound of 0 one machine makes between 200-240 holes per day. This is well above the average of 140 mentioned in section 3.2. One can therefore conclude that a lower bound of 0 is not acceptable to get a decent result, this was of course the expected outcome.

When analysing tables 3 and 4 they unsurprisingly are very similar. The diﬀerence in the estimated number of holes lies between 2 and 3%. This means if one were to choose between them for an upper bound estimate, the margin of error would not be that large. For both of them, the average holes per day is basically the same, summing both drill arms it shows that all machines make about 125 → 160 holes per day, this is much better since 140 is within the range. The standard deviation is also lower, but this is to be expected since we are forcibly ﬁtting the data into more of a bell curve.

For a lower bound of 60 the estimate has decreased by 5-6%, which was a bit lower than expected. This means that for every tenth-increase in lower bound between 30 and 60 the estimate on average only dropped 1.6-2%. So while a 900 diﬀerential for M1D1 between 30 and 60 may seem large it certainly wouldn’t signify anything major. The hole per day average error would at most be 6 which should not be considered fatal.

However when the results in table 6 are cross-analysed with the previous ones it is directly apparent that the results given by the algorithm is much lower than what should be the lower bound. If all holes were to take 125s across the entire

(23)

data set, the number of holes calculated by dividing its active time with that speed should not be lower than the lowest estimated result of the algorithm.

This is cause for concern since now the solution made for this project may not be satisfactory but there is some things to consider ﬁrst. First, if every hole takes 125 seconds for M1D1 that would be a total of 18502 holes over its 211 day span of activity. That is 87 holes per day for each arm which would be almost 180 each day, going any lower on the time per hole average would result in much higher per day frequency deviating a lot from the guideline of 140. With this in mind there seems to be information that is either false or something is missing from the guidelines. When calculating the result via an average based on drill speed and hole depth the results should fall more in line with the per day basis than it does now. This leads to the belief that there could be 3 pieces of information which are incorrect.

1. It is possible that the given drill speeds are wrong

2. The depth per hole might be wrong, it could vary more than currently believed

3. The holes per day average is wrong

If only one of these were to be slightly wrong it would make any estimation very hard to make, if multiple are wrong than the odds of getting anything right becomes minuscule. However the result it self is not bad, no matter the outcome a lot is now known for how this work could continue.

6.2 Missing groundwork

With the available time running out and the work having to be ﬁnished it is now very clear what was missing from the launch of the project. When the sponsor approached with the proposal of this project, the solution was supposed to be within machine learning, but with a lot of ground work missing it was pivoted to a more generalised solution. Why this was the case will be discussed more in detail in section 6.3.2. A lot of information was missing or misleading for this thesis, therefore much work went in to looking at activity graphs of the drills and drawing conclusions from just that. There are a few points of mention regarding this problem which would be considered crucial now.

1. Answer sheet

2. Detailed database/log speciﬁcation 3. Drill rig and work ﬂow information

These three are the main topics when looking for insuﬃcient ground work.

(24)

6.2.1 Answer sheet

The ﬁrst thing of discourse would be the lack of actual result data to look at, to estimate a result for an activity with a new method, one would need actual results from the old method. As long as the old method gave a good enough answer this would be adequate for use in cross-analysis. In this case the result would be many holes were created over a period of time and what period. Having worked on the project now for about 5 weeks it was abundantly clear quite early that it would be very hard to get a good result. The ﬁnal solution provided in this thesis is uncomplicated at best and may also be completely wrong at worst.

This of course is because there simply is not enough to work with and compare it to, the conclusion being it’s not possible to know at this time.

Something else related to this would be the absence of a detailed log specifying the activity of a drill and exactly when one hole was finished. As mentioned one large portion of the work went into analysing activity graphs, most of this was to find behavioral patterns and trying to figure out when a drill had made one hole. Having an actual log containing this data would grant tons of insight into how a more advanced solution might work. Figure 10 showcases how one these logs could look like when plotted on a graph, the yellow bars noting whenever one hole was finished.

Figure 10: Example of detailed log

Having this type of information would have reduced unnecessary work time but it would also lend to a better solution being made. This type of log would also have been very useful if one were to approach this again with machine learning.

(25)

6.2.2 Detailed database speciﬁcation

In section 3.3 an explanation about the database and of its structure was given to create an outline of how it worked. This information was only about the underlying building blocks and real detail was not given, reason being there was not any detailed speciﬁcation. Descriptions of entries to the database was limited to knowing what they should be, nothing was said about the actual entries. There were a lot of diﬀerent reported activities sounded promising for analysis, for example:

• cumulativeWorkHours

• cumulativeIdleHours

• cumulativeDrillHours

• cumulativeDrillMeters

All of these look like activities that could be used in calculations for the result but there is no speciﬁcation about what exactly they imply and when they are reported. Because of this they were pushed to the side since no information could really be extracted from them.

Then there is the problem of not knowing what machines are of interest.

There are several rigs that report with EV DRILL ACTIVITY, the separation of what type of machine it is, is not made however. When given a picture with the rig of interest it was clear that it was equipped with two drill arms. Looking at the log it was found that many of the machines which reported drill activities always reported with the same node, in this case 1. Others reported the same activity but used two values for the machine node column. Finding this it was concluded that the machines which reported with two node values were the ones of interest based on knowledge of how the drill looks and that the machine node value corresponded to each drill arm. The rest of the entries were simply ignored and thought to be of a completely diﬀerent machine type.

A few weeks of work and analysis had to be discarded because there was a misunderstanding of how these logs ﬁrst worked. The conclusion made here came pretty late which of course hindered the development greatly which in hindsight could have been prevented.

6.2.3 Drill rig and work ﬂow information

Mentioned previously this was key information that should have been more concrete. Information about the drill rig and general use case statistics should have been better prepared. It is very unlikely that information about the drill speeds are wrong but saying all holes are 2.7m seems wrong since that would mean the longest it takes for one to be ﬁnished is 125 seconds. This a value

(26)

drill speed is constant there is no saying that an operator can not make short pauses in the process of letting it sink into the stone wall, leaving it on during this period. If for every 15 seconds of drilling a 5 second pause was made the average time per hole would increase by 32 seconds if the drill operates at its slowest speed. If this was the case the estimate given by the algorithm is not far oﬀ all of a sudden.

6.3 Other possible solutions

Here a quick discussion is made around some other possible angles one might have approached this problem with.

6.3.1 Using inactivity

The current algorithm relies solely on looking at the active state of a specific drill and making estimates on that data. However thinking about it, the inactivity should match the activity in some ways since it is a binary function. For every activity there will always be inactivity. So it was theorised that one may measure whenever the drill was not operating and somehow use that information to predict when one hole was finished. A test was done by running the algorithm in reverse, that meaning creating histograms for when the machine was not active, using a set of different values for the upper and lower bound. The results were not great and had huge variance, lower bounds of 10, 20 and 30 were chosen. In order the estimated result for one drill went as follows; 20785, 17816 and 12702.

A value of 20 looked liked it was the closest, at least to what was estimated by looking at the activity. The reason this approach was later passed over was mainly because it was not as clear in how to ﬁnd a good average for the bounds.

When looking at the time series graphs there were many scenarios where there would be similar length pauses for small bursts of activity as they would be for longer ones. Without having more information about the habits of operators using the machines it was determined that this method would lead no where and was thus put aside. Having to take into account the schedule of the workers also became a factor that was slightly hard to work around, especially since the data set was acquired during the Covid-19 pandemic which has aﬀected to schedule to become more unpredictable.

6.3.2 Machine learning

The main proposal for this project was if it was possible to solve with some type of machine learning method. From the start it did sound like it was possible since the problem it self is not complicated and it showed signs of having the correct building blocks for such a solution to be possible. The first approach was to see if one could create a classification algorithm of sorts. The computer would look at the activity of a drill and with that input it would classify segments of that activity as finished holes. So if one drill was active 90s and 10s later was active for another 50s it could use previous experience to determine if it was

(27)

one hole or not. An approach like this would only work if there existed a good training set from the start, for example an entire log that looks like the one in ﬁgure 10 mentioned in section 6.2.1.

The second approach would be creating a clustering algorithm that operates on time series data. Quite early in research the one huge problem faced was the lack of relevant research on the topic. Most machine learning techniques that work with time series are developed for prediction analysis, clustering however is less researched. Aileen Nielsen, data analyst previously said in a keynote on time series analysis, ”Time series clustering is surprisingly difficult”.[9] This is most likely the main reason why research was hard to find. With this being the case and the researcher being relatively inexperienced with machine learning it was obvious that such a method was not feasible with the amount of time that was given for the project. Furthermore what research has been done focused mainly on finding specific patterns in data, that being comparing two segments of a time series and determining if they belong in the same cluster based on appearance. It was thus determined that clustering would not work, it was after this that the method brought forth in this thesis started taking shape.

6.4 Ethical Aspects

Regarding some of the ethical implications this work work may bring forward there are some main points. The first one being the aspect of monitoring the operators with the help of a solution like this. This meaning that the mining company in question could continue to ask for reports from the workers but also use this algorithm or other similar solution, they could then compare the algorithms results with whatever reports they receive and use that as a strong hand to keep them in line. With this in mind gathering new information regarding the specifics of how the drill is used and when holes are finished might prove harder since the operators may realize these implications. While this is not likely the use-case of the algorithm it is still something to take into account.

One could also explore how this furthers the agenda of automating jobs, in other words computers doing more and humans less. With the development of more advanced robotics and artiﬁcial intelligence there are more jobs that are being done by programs rather than people. Ethically this is concerning because at some point there may not exist work for those without higher education.

While this project doesn’t directly concerned with this development it could still be viewed as furthering the agenda that computers are more capable then humans.

(28)

7 Conclusions and future work

The conclusion of this report is that the solution provided here could come close to giving good results if the given information about drill speeds and hole depths are correct. This of course hangs in that speciﬁc detail a lot but is also dependent on other factors like operator habits which currently is unknown. It is also unclear if the algorithm currently gives results on the correct machines, which could of course be ﬁxed by specifying the machine MAC-id’s that are of actual interest. In the end machine learning was not used for the thesis because a lack of ground work was present, the researchers inexperience also played part in this conclusion.

If the work were to continue the proper way to go about it would probably be starting from scratch, taking into account some of the things brought up here. One could also keep the solution of this report and use in the future to see how correct it is. So going forward the first step would be making an extensive collection of information in the mine. This would include gathering details about the work flow, the habits of operators and collecting a detailed log specifying at which points in time a specific rig and drill arm has drilled one hole. The log can be used to get a total which can be compared to whatever this algorithm predicts, it can also be used to develop more advanced methods to solve the problem. One of those methods could be machine learning, the technique that would work the best is most likely a supervised algorithm, more specifically a classifying one.

Overall the work was a bit rocky due to the aforementioned circumstances of lacking information and also inexperience, but in that sense it was also surprisingly smooth. The main conclusion to this is that there are a lot of methods that could solve this problem most likely, it will depend on what information is known before hand. With all this said, if the work would continue in the future, it has the potential to a smarter solution and it will be up to the researcher to see how good of a solution can be created.

(29)

References

[1] MathWorks. What is MATLAB? https://se.mathworks.com/discovery/

what-is-matlab.html. Accessed: 29/04/2021.

[2] Python Software Foundation. What is python? executive summary. https:

//www.python.org/doc/essays/blurb/. Accessed: 29/04/2021.

[3] Python Software Foundation. Numpy, about us. https://numpy.org/

about/. Accessed: 29/04/2021.

[4] Reza Forghani, Peter Savadjiev, Avishek Chatterjee, Nikesh Muthukrish- nan, Caroline Reinhold, and Behzad Forghani. Radiomics and artiﬁ- cial intelligence for biomarker and prediction model development in oncol- ogy. Computational and Structural Biotechnology Journal,, 17:995–1008, 2019. ISSN 2001-0370. URL https://www.sciencedirect.com/science/

article/pii/S2001037019301382.

[5] A. L. Samuel. Some studies in machine learning using the game of checkers.

IBM Journal of Research and Development, 3(3):210–229, 1959. ISSN 0018- 8646.

[6] Thomas M. Mitchell. Machine Learning. McGraw-Hill, Inc., USA, 1 edition, 1997. ISBN 0070428077.

[7] IBM Cloud Education. What is machine learning? https://www.ibm.com/

se-en/cloud/learn/machine-learning. Accessed: 28/04/2021.

[8] IBM Cloud Education. Supervised vs unsupervised-learning. https://

www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning. Ac- cessed: 28/04/2021.

[9] Aileen Nielsen. Modern time series analysis. https://www.youtube.com/

watch?v=v5ijNXvlC5A&t=6787s.

(30)

Appendices

{

” r e s u l t s ” : [ {

” s t a t e m e n t i d ” : 0 ,

” s e r i e s ” : [ {

”name ” : ” c p u l o a d s h o r t ” ,

” columns ” : [

” t i m e ” ,

” v a l u e ” ] ,

” v a l u e s ” : [ [

”2015−01−29T21 : 5 5 : 4 3 . 7 0 2 9 0 0 2 5 7Z” , 2

] , [

”2015−01−29T21 : 5 5 : 4 3 . 7 0 2 9 0 0 2 5 7Z” , 0 . 5 5

] ] } ] } ] }

Figure 11: Example of database structure, taken from inﬂux db’s documentation website

(31)

Figure 12: Graph displaying both drill arms activity

(32)

Analysis on how to estimate the number of holes a drill rig has completed based on its activity