Towards data mining based decision support in manufacturing maintenance

(1)

http://www.diva-portal.org

This is the published version of a paper published in Procedia CIRP.

Citation for the original published paper (version of record):

Gandhi, K., Schmidt, B., Ng, A H. (2018)

Towards data mining based decision support in manufacturing maintenance Procedia CIRP, 72: 261-265

https://doi.org/10.1016/j.procir.2018.03.076

Access to the published version may require subscription.

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15901

(2)

ScienceDirect

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect

Procedia CIRP 00 (2017) 000–000

www.elsevier.com/locate/procedia

Peer-review under responsibility of the scientific committee of the 28th CIRP Design Conference 2018.

28th CIRP Design Conference, May 2018, Nantes, France

A new methodology to analyze the functional and physical architecture of existing products for an assembly oriented product family identification

Paul Stief *, Jean-Yves Dantan, Alain Etienne, Ali Siadat

École Nationale Supérieure d’Arts et Métiers, Arts et Métiers ParisTech, LCFC EA 4495, 4 Rue Augustin Fresnel, Metz 57078, France

* Corresponding author. Tel.: +33 3 87 37 54 30; E-mail address: paul.stief@ensam.eu

Abstract

In today’s business environment, the trend towards more product variety and customization is unbroken. Due to this development, the need of agile and reconfigurable production systems emerged to cope with various products and product families. To design and optimize production systems as well as to choose the optimal product matches, product analysis methods are needed. Indeed, most of the known methods aim to analyze a product or one product family on the physical level. Different product families, however, may differ largely in terms of the number and nature of components. This fact impedes an efficient comparison and choice of appropriate product family combinations for the production system. A new methodology is proposed to analyze existing products in view of their functional and physical architecture. The aim is to cluster these products in new assembly oriented product families for the optimization of existing assembly lines and the creation of future reconfigurable assembly systems. Based on Datum Flow Chain, the physical structure of the products is analyzed. Functional subassemblies are identified, and a functional analysis is performed. Moreover, a hybrid functional and physical architecture graph (HyFPAG) is the output which depicts the similarity between product families by providing design support to both, production system planners and product designers. An illustrative example of a nail-clipper is used to explain the proposed methodology. An industrial case study on two product families of steering columns of thyssenkrupp Presta France is then carried out to give a first industrial evaluation of the proposed approach.

Peer-review under responsibility of the scientific committee of the 28th CIRP Design Conference 2018.

Keywords: Assembly; Design method; Family identification

1. Introduction

Due to the fast development in the domain of communication and an ongoing trend of digitization and digitalization, manufacturing enterprises are facing important challenges in today’s market environments: a continuing tendency towards reduction of product development times and shortened product lifecycles. In addition, there is an increasing demand of customization, being at the same time in a global competition with competitors all over the world. This trend, which is inducing the development from macro to micro markets, results in diminished lot sizes due to augmenting product varieties (high-volume to low-volume production) [1].

To cope with this augmenting variety as well as to be able to identify possible optimization potentials in the existing production system, it is important to have a precise knowledge

of the product range and characteristics manufactured and/or assembled in this system. In this context, the main challenge in modelling and analysis is now not only to cope with single products, a limited product range or existing product families, but also to be able to analyze and to compare products to define new product families. It can be observed that classical existing product families are regrouped in function of clients or features.

However, assembly oriented product families are hardly to find.

On the product family level, products differ mainly in two main characteristics: (i) the number of components and (ii) the type of components (e.g. mechanical, electrical, electronical).

Classical methodologies considering mainly single products or solitary, already existing product families analyze the product structure on a physical level (components level) which causes difficulties regarding an efficient definition and comparison of different product families. Addressing this

Procedia CIRP 72 (2018) 261–265

Peer-review under responsibility of the scientific committee of the 51st CIRP Conference on Manufacturing Systems.

10.1016/j.procir.2018.03.076

ScienceDirect

Procedia CIRP 00 (2018) 000–000

www.elsevier.com/locate/procedia

51st CIRP Conference on Manufacturing Systems

Towards data mining based decision support in manufacturing maintenance

Kanika Gandhi, Bernard Schmidt, Amos H.C. Ng

School of Engineering Science, Högskolan i Skövde, Skövde 54128, Sweden

* Corresponding author. Tel.: +46-732684505. E-mail address: Kanika.gandhi@his.se

Abstract

The current work presents a decision support system architecture for evaluating the features representing the health status to predict maintenance actions and remaning useful life of component. The evaluation is possible through pattern analysis of past and current measurements of the focused research components. Data mining visualization tools help in creating the most suitable patterns and learning insights from them.

Estimations like features split values or measurement frequency of the component is achieved through classification methods in data mining.

This paper presents how the quantitative results generated from data mining can be used to support decision making of domain experts.

Keywords: Maintenance; Decision Support System; Data Mining; Classification Methods; Knowledge Extraction

1. Introduction

The operational correctness of the manufacturing process leads manufacturing companies to success. An efficient manufacturing process ensures reliable and quality product outcome. Maintenance actions are carried out for each component as per their health status to reduce unexpected machine breakdown. Traditionally, degradation process is presented by extensive experiments and verifications, which is not always feasible. Data-driven tools in data mining (DM) help in determining the degradation condition/health status of operational components in order to perform predictive maintenance. The DM techniques require sufficient historical/offline data that build upon finding variables that contain information about component behaviour and health status. Not all the variables are important and informative to understand the health status of the equipment. A few, which show non-random behaviour may contain information about the degradation trend. To select such features/variables, statistical trend analysis of the equipment operation data can provide better visualization about patterns of equipment’s features. The feature data and patterns reflect the degradation of the

equipment, hence play an important role in learning the current operating conditions of the equipment and predicting its failure.

In this paper, a decision support system architecture (DSSA) is presented, to evaluate the health status of the critical components. The evaluated health status of the component supports decision maker to choose correct maintenance action.

Many times, a scheduled maintenance action is deferred due to an indication of the good health status of the component. A computer-mediated DSS assists in decision making by presenting and interpreting various possible alternatives. Here, results from DM processes can be coupled into a decision support system (DSS), by providing the methodology to compare many available DM tools and selecting the suitable ones as per the requirements of various stages in DSS.

The rest of the paper is organized as follows. Section 2 describes a brief literature review. Section 3 is the main section and explains proposed DSSA in detail, that includes data pre- processing & transformation; DM tools for data reduction;

knowledge extraction & Visualization and inference engine.

Section 4 presents an application example and section 5 concludes the paper and prospects future work.

(3)

262 Kanika Gandhi et al. / Procedia CIRP 72 (2018) 261–265

2 Author name / Procedia CIRP 00 (2018) 000–000

2. Literature Review

The integration of data mining and decision support system using equipment health (deterioration condition) has given importance in the research. A large number of maintenance policies as per health status has been investigated, developed and successfully applied using the current health status and [1, 2] taking the future equipment health state for making maintenance decision [3]. [4] proposes an architecture for efficient Predictive Maintenance (PM) according to the real- time estimate of the future state of the components. The architecture is built on supervision and prognosis tools. [5]

proposes a data mining approach using a machine learning technique called anomaly detection (AD). This method employs classiﬁcation techniques to discriminate between defect examples. [6] presents an integrated system health management-oriented maintenance decision support methodology and framework for a multi-state system based on data mining. [7] describes a Decision Support System (DSS) enabling the early identification of problems occurring on manufacturing lines thus suggesting related recovery actions, together with the potential repercussions of their adoption, at economic and environmental level. [8] proposes an outline of the architectural design and the conceptual framework for a Smart Maintenance Decision Support System (SMDSS) which is capable of providing end users with recommendations to improve asset lifecycles.

Compared with these works, most of the main parts in DSSA are to early detect different types of defects and anomalies. In the current study, the DSSA proposes an integration of multi- source data, combining and comparing many DM techniques to find important features indicating health status for finally making the decision for appropriate maintenance action.

3. Decision Support System Architecture

In the dynamic business environment, the decision making for one division in the company is based on knowledge gathered from all the connected divisions, which is only possible when data from all the divisions are collected and integrated into a common database. A traditional DSS processes information from data set of one or two sources. The computer-mediated DSS can process data from various sources and integrate the data and results at a common platform, so that decision making can be fast and more accurate. Fig. 1 shows a DSSA, explaining the architecture steps to attain the final objective of obtaining the health status of the components so that correct maintenance action can be performed.

The architecture proposed in the current paper includes an algorithm which is the main core of the architecture. The algorithm is coded on Matlab R2016a, which provides results of the following stages in the architecture. At the stage of data integration, a validator has also been connected to the core algorithm, who checks the dimension, units, and correctness of the measured data points. The flow of core algorithm has been shown in the fig. 1. The metadata of the core algorithm is the features extraction and classification methods and the corresponding parameters in the methods.

Fig. 1. Decision Support System Architecture

The first stage is data collection from the machine and integrates them as input to data mining. The current research paper describes the case of a critical component (ball screw) in CNC machine tools in an automotive company. The machine tools generate the processing data and production data. The maintenance data, which is technically named a work order is saved by the responsible engineer. All the data type are stored in different locations. Here, an analytical processing is an efficient method to access all data sites for analyzing multidimensionality and decision support. The analytical processing using data mining tools to visualize and generate informative patterns from the stored data. Data integration expands the dimensionality of the dataset, adds more variety and also changes its structure. The ultimate objective of the data integration is to make a standard data structure to make data mining application easier. In this case, the data integration is performed on maintenance data and processing data, where process performance of the component (Ball screw) is measured and fitted in its total lifetime, which is obtained from work orders.

3.1. Data Pre-processing & Transformation

For consistent and correct output of DM techniques and data analysis, data pre-processing is the foundation. Incorrect data integration leads to unreliable decision support. The DM process has a large part of data preparation and improvement of data quality. Hence, in order to increase the precision of DM output, data pre-processing should not be ignored.

As discussed in section 3, the common database center has data from multiple sites, and individual sites have data from all the measured components that create noise in the data. In such case, the component-specific data may be noisier. Data cleaning and pre-processing usually means pre-processing of the data, checking the integrity of the data and consistency, smoothing and eliminating the noise in data. There is a number of data pre-processing methods like data cleaning, data integration, data transformation and data reduction. Data integration is performed at the initial stage to bring all the information to a common platform. Data cleaning and data transformation have to be customized as per the requirement of DM technique, where component data is transformed into a specific format which is more appropriate for mining. It

Author name / Procedia CIRP 00 (2018) 000–000 3

comprises of constructing a new variable, changing the structure of current attributes or normalization to better understand the characteristic of data. Sometimes, data reduction process is also considered as a DM method.

In the current study, the data from maintenance and processing is not in the required form to use. The data is transformed to understand all the features while ball screw measurement is performed. The transformed data is showing 504 features and 300 measurement instances, which include historically replaced and not replaced ball screws. Further, the instances are aligned with a total lifetime of the ball screw on the date of measurement.

3.2. Data Reduction

Data reduction methods are applied to find a condensed representation of the data which is reduced in volume, yet maintaining the integrity of the base data. To produce similar analytical output as a base output from base data, the mining on the reduced data should be more efficient. The common methods are data compression, numerosity reduction, and features extraction etc. Discussion continues from section 3.1, 300 measurement instances comprise historically replaced (failed) and not replaced (not failed) ball screws. For the current study, historically replaced ball screw instances have been considered. These instances show more relevance to the solution process, as one can visualize the high volatile behaviour of ball screw near to end of life (just before replacement).

3.3. Feature extraction

As section 3.1 - 3.2, features extraction is also the part of data pre-processing. Section 3.1 -3.2 is the pre-processing steps before actual core algorithm in DSSA and section 3.3 is the part of DSSA core algorithm. Here, trends are extracted from the data variables (features), which can be used to develop an offline model to further classify new online data. The classification process can be more efficient when distinguished features are extracted from the variables’ trends. Different approaches have been proposed to extract features such as mean, variance, multi-exponential function, curve fitting, discrete wavelet transform and discrete Fourier transform [9], which are comprised in one of the following approaches, filter, wrapper and embedded [10]. The filter approach uses characteristics of the features to extract some important ones and excludes others. The wrapper approach uses a particular learning algorithm to evaluate and to decide which features should be extracted. The embedded approach incorporate feature extraction as part of the training process, such as decision trees that decides in each stage the feature with the best ability to discriminate [11].

In the current research study, a two-window comparison approach, which can be included into filter approach has been used to extract important features from 504 features. The data of the component features are divided into 3 windows: initial life, middle life, and life towards failure or in the end (Fig. 2.).

In the two window comparison approach, features’ data from initial life (window 1) and end of life (window 3) are compared

to extract the features with some change in trend. The three window division is the classification of data for each feature. Further, in the study, this classification shall help in evaluating the features with the best ability to discriminate in each class.

Fig. 2. Ball screw movement 3.4. Knowledge Extraction and Visualization

The correct indication of the component’s health status can lead to an appropriate maintenance action. The embedded approach described in section 3.3, has better application while mining more knowledge in terms of health status from the extracted features. This approach can be used to analyze the hidden relationships among different features.

The architecture designed in the current study is used for discovering paths from massive measured data. The paths are retrieved from the decision trees and leading to one of the health status windows shown in fig. 2. The results in the knowledge extraction stage have been received from the core algorithm of DSSA, which includes all the process from section 3.3-3.6. In the core algorithm of proposed DSSA, classification methods for path mining like decision tree (DT) and random forest (RF) show mean classification accuracies. The classification methods use a learning algorithm to find a model that shows the best fit between variable set and the pre-defined classes in the input data.

3.5. Decision tree and Random forest

DT is a powerful method for classification and prediction problems that is based on the partition of the attribute space, by using an iterative procedure of binary partition providing a highly interpretable model [12-14]. DT is a non-parametric approach, hence any prior understanding of probability distribution is not required. Being an inexpensive approach it is possible to quickly construct models even for large training sets. It provides an expressive representation for learning discrete value functions. DT is robust to noise and does not get adversely affected by the redundant variables.

In the current study, the features’ data is continuous in nature made us use Gini index to split each feature value. To evaluate the paths’ accuracy, a number of decision trees have been developed by randomly selecting the training set and testing set. This approach is not an efficient one, as a selection of training and testing data highly vary the accuracy of the tree and its corresponding paths. Therefore, RF models can be more

(4)

2. Literature Review

The integration of data mining and decision support system using equipment health (deterioration condition) has given importance in the research. A large number of maintenance policies as per health status has been investigated, developed and successfully applied using the current health status and [1, 2] taking the future equipment health state for making maintenance decision [3]. [4] proposes an architecture for efficient Predictive Maintenance (PM) according to the real- time estimate of the future state of the components. The architecture is built on supervision and prognosis tools. [5]

proposes a data mining approach using a machine learning technique called anomaly detection (AD). This method employs classiﬁcation techniques to discriminate between defect examples. [6] presents an integrated system health management-oriented maintenance decision support methodology and framework for a multi-state system based on data mining. [7] describes a Decision Support System (DSS) enabling the early identification of problems occurring on manufacturing lines thus suggesting related recovery actions, together with the potential repercussions of their adoption, at economic and environmental level. [8] proposes an outline of the architectural design and the conceptual framework for a Smart Maintenance Decision Support System (SMDSS) which is capable of providing end users with recommendations to improve asset lifecycles.

Compared with these works, most of the main parts in DSSA are to early detect different types of defects and anomalies. In the current study, the DSSA proposes an integration of multi- source data, combining and comparing many DM techniques to find important features indicating health status for finally making the decision for appropriate maintenance action.

3. Decision Support System Architecture

In the dynamic business environment, the decision making for one division in the company is based on knowledge gathered from all the connected divisions, which is only possible when data from all the divisions are collected and integrated into a common database. A traditional DSS processes information from data set of one or two sources. The computer-mediated DSS can process data from various sources and integrate the data and results at a common platform, so that decision making can be fast and more accurate. Fig. 1 shows a DSSA, explaining the architecture steps to attain the final objective of obtaining the health status of the components so that correct maintenance action can be performed.

The architecture proposed in the current paper includes an algorithm which is the main core of the architecture. The algorithm is coded on Matlab R2016a, which provides results of the following stages in the architecture. At the stage of data integration, a validator has also been connected to the core algorithm, who checks the dimension, units, and correctness of the measured data points. The flow of core algorithm has been shown in the fig. 1. The metadata of the core algorithm is the features extraction and classification methods and the corresponding parameters in the methods.

Fig. 1. Decision Support System Architecture

The first stage is data collection from the machine and integrates them as input to data mining. The current research paper describes the case of a critical component (ball screw) in CNC machine tools in an automotive company. The machine tools generate the processing data and production data. The maintenance data, which is technically named a work order is saved by the responsible engineer. All the data type are stored in different locations. Here, an analytical processing is an efficient method to access all data sites for analyzing multidimensionality and decision support. The analytical processing using data mining tools to visualize and generate informative patterns from the stored data. Data integration expands the dimensionality of the dataset, adds more variety and also changes its structure. The ultimate objective of the data integration is to make a standard data structure to make data mining application easier. In this case, the data integration is performed on maintenance data and processing data, where process performance of the component (Ball screw) is measured and fitted in its total lifetime, which is obtained from work orders.

3.1. Data Pre-processing & Transformation

For consistent and correct output of DM techniques and data analysis, data pre-processing is the foundation. Incorrect data integration leads to unreliable decision support. The DM process has a large part of data preparation and improvement of data quality. Hence, in order to increase the precision of DM output, data pre-processing should not be ignored.

As discussed in section 3, the common database center has data from multiple sites, and individual sites have data from all the measured components that create noise in the data. In such case, the component-specific data may be noisier. Data cleaning and pre-processing usually means pre-processing of the data, checking the integrity of the data and consistency, smoothing and eliminating the noise in data. There is a number of data pre-processing methods like data cleaning, data integration, data transformation and data reduction. Data integration is performed at the initial stage to bring all the information to a common platform. Data cleaning and data transformation have to be customized as per the requirement of DM technique, where component data is transformed into a specific format which is more appropriate for mining. It

comprises of constructing a new variable, changing the structure of current attributes or normalization to better understand the characteristic of data. Sometimes, data reduction process is also considered as a DM method.

In the current study, the data from maintenance and processing is not in the required form to use. The data is transformed to understand all the features while ball screw measurement is performed. The transformed data is showing 504 features and 300 measurement instances, which include historically replaced and not replaced ball screws. Further, the instances are aligned with a total lifetime of the ball screw on the date of measurement.

3.2. Data Reduction

Data reduction methods are applied to find a condensed representation of the data which is reduced in volume, yet maintaining the integrity of the base data. To produce similar analytical output as a base output from base data, the mining on the reduced data should be more efficient. The common methods are data compression, numerosity reduction, and features extraction etc. Discussion continues from section 3.1, 300 measurement instances comprise historically replaced (failed) and not replaced (not failed) ball screws. For the current study, historically replaced ball screw instances have been considered. These instances show more relevance to the solution process, as one can visualize the high volatile behaviour of ball screw near to end of life (just before replacement).

3.3. Feature extraction

As section 3.1 - 3.2, features extraction is also the part of data pre-processing. Section 3.1 -3.2 is the pre-processing steps before actual core algorithm in DSSA and section 3.3 is the part of DSSA core algorithm. Here, trends are extracted from the data variables (features), which can be used to develop an offline model to further classify new online data. The classification process can be more efficient when distinguished features are extracted from the variables’ trends. Different approaches have been proposed to extract features such as mean, variance, multi-exponential function, curve fitting, discrete wavelet transform and discrete Fourier transform [9], which are comprised in one of the following approaches, filter, wrapper and embedded [10]. The filter approach uses characteristics of the features to extract some important ones and excludes others. The wrapper approach uses a particular learning algorithm to evaluate and to decide which features should be extracted. The embedded approach incorporate feature extraction as part of the training process, such as decision trees that decides in each stage the feature with the best ability to discriminate [11].

In the current research study, a two-window comparison approach, which can be included into filter approach has been used to extract important features from 504 features. The data of the component features are divided into 3 windows: initial life, middle life, and life towards failure or in the end (Fig. 2.).

In the two window comparison approach, features’ data from initial life (window 1) and end of life (window 3) are compared

to extract the features with some change in trend. The three window division is the classification of data for each feature.

Further, in the study, this classification shall help in evaluating the features with the best ability to discriminate in each class.

Fig. 2. Ball screw movement 3.4. Knowledge Extraction and Visualization

The correct indication of the component’s health status can lead to an appropriate maintenance action. The embedded approach described in section 3.3, has better application while mining more knowledge in terms of health status from the extracted features. This approach can be used to analyze the hidden relationships among different features.

The architecture designed in the current study is used for discovering paths from massive measured data. The paths are retrieved from the decision trees and leading to one of the health status windows shown in fig. 2. The results in the knowledge extraction stage have been received from the core algorithm of DSSA, which includes all the process from section 3.3-3.6. In the core algorithm of proposed DSSA, classification methods for path mining like decision tree (DT) and random forest (RF) show mean classification accuracies. The classification methods use a learning algorithm to find a model that shows the best fit between variable set and the pre-defined classes in the input data.

3.5. Decision tree and Random forest

DT is a powerful method for classification and prediction problems that is based on the partition of the attribute space, by using an iterative procedure of binary partition providing a highly interpretable model [12-14]. DT is a non-parametric approach, hence any prior understanding of probability distribution is not required. Being an inexpensive approach it is possible to quickly construct models even for large training sets. It provides an expressive representation for learning discrete value functions. DT is robust to noise and does not get adversely affected by the redundant variables.

In the current study, the features’ data is continuous in nature made us use Gini index to split each feature value. To evaluate the paths’ accuracy, a number of decision trees have been developed by randomly selecting the training set and testing set. This approach is not an efficient one, as a selection of training and testing data highly vary the accuracy of the tree and its corresponding paths. Therefore, RF models can be more

(5)

264 Kanika Gandhi et al. / Procedia CIRP 72 (2018) 261–265

4 Author name / Procedia CIRP 00 (2018) 000–000

suited to evaluate the tree’s and path’s accuracy. A skeleton decision tree algorithm is explained in [15].

The RF model is the tree-type classifier, created by grouping the trees for classification and regression (CART) and introduced by Breiman [16]. Each tree classifier is named a class predictor. A large number of trees make one RF from a sub-dataset. The distinguishing training data is retrieved by using bagging. The bagging process means selecting random samples with a replacement for stability and classification accuracy. It reduces variance and assists to avoid over-fitting.

The final decision made by summing the votes of class predictors and then choosing the winner class in terms of the number of votes. RF performance is measured by a metric called out of bag error (oob-error) calculated as the average of the rate of the error in each weak learner [17, 18].

In the current study, 10-fold RF model is operated and each fold includes 1000 trees. The accuracy values are compared among all the folds and the best fold is chosen out. Later, the chosen fold will describe the paths and their corresponding accuracies, which will provide extracted knowledge in the terms of paths consisting of features, their split value and the window that path belongs to. The features described in the path are the important features and the split values become the threshold value of each feature. The window number indicated in the path explains the health status, which is further used by the domain experts to perform appropriate maintenance actions.

3.6. Visualization

The extracted knowledge is of no good importance if it is not interpreted well. Visualization is one of the methods to understand the results. Like in fig. 2, trends in the features (variables) of ball screw can be understood well. The results like important decision trees, important paths and important variables in the paths (fig. 3.). Visualizing at-a-glance information about which classes are close, which classes differ, find clusters within classes and find easy/hard/unusual cases can accelerate perception, provide insight and control, and harness this flood of valuable data to gain a competitive advantage in DSS.

Fig. 3. Decision tree visualization

Fig. 3. Shows all the paths in one decision tree with their health status indicator. An important path with its split values has been exemplified. Such type of visualization is easy to

interpret and more useful to understand the relationships among many variables for future predictions.

3.7. Inference Engine

As an important part of DSS, common methods used for inference engine mainly include important paths, and decision tree results. DSSA along with data mining methods proposed in this paper is based on following ideas:

• The aim of the DSSA is the non-trivial process of discovering new knowledge in the huge database, this means that the main difficulty of the support system to train the data mining models in the architecture, and inferences are the assistance to validate the models.

• DSSA can extract the knowledge at a deeper level. More specifically, discovering other relations based on the existing variables and existing relations can help to mine more useful results for future predictions. The last stage is shown in DSSA (fig. 1.) that asks for analysis for predicting maintenance clarifies the requirement of extracting more in- depth knowledge.

• Because data themselves may contain some characteristics such as uncertainty, non-monotony, incompletion, etc., the process of DSS may also be complex with multi-solutions.

However, in this situation, multi-solution can be a benefit for the domain expert, as it provides more alternatives to be pursued.

• The DSSA should be novel, potentially beneficial, operative, effective and understandable to users.

From the above description, we can see that the nature of DSSA is a machine learning process. Its aim is to obtain deeper knowledge. The resource of learning is a database and results from the data mining models in DSSA.

4. Application Example

The prediction of remaining useful life (RUL) of the machine or component is affected by numerous sources of machine process and operating conditions. It is important to interpret these sources in order to facilitate significant decision support. The prediction of RUL and maintenance actions depends on collected data, feature calculations, operating conditions, threshold setting and prediction algorithms.

The proposed DSSA assists in learning the relationship among collected data, features calculation, threshold setting on the basis of features’ measured data which represents operating condition of the ball-screw. The main concept of the DSSA is to anticipate the manifestation of the possible failure by classifying operational data in health status windows along with threshold values of the features in order to perform proactive maintenance action. Hence, current DSSA is strongly related to the prediction of RUL and maintenance action.

Author name / Procedia CIRP 00 (2018) 000–000 5

The classification of operational data into health status windows is performed based the life of a ball screw and ball- bar measurement (BBM) data. The features measured during BBM represent the operational condition of the ball-screw. The core algorithm in DSSA discovers important paths, the features included in the paths, their split value (threshold value) and the health status that path is indicating. Based on the results from DSSA, prediction can be made for each health status window, which is the reflection of the type of operational program is run on the machine, that directly connects to predicting RUL and maintenance action of the component.

5. Conclusion

In this paper, a DSSA based on DM is proposed. The architecture integrates several DM techniques to extract important information in terms of important features and the important paths in the decision trees. The core algorithm in the architecture builds on the selection of interesting features in two-window comparison approach from the input offline measurements. It constructs representative features that can be used to explain health status of the component. The extracted features then used to in classification methods to evaluate the accuracy of classes with respect to the paths retrieved from the data mining classification methods. As an example decision tree (DT) and random forest (RF) classification methods are proposed. RF is considered better than DT, as it reduces the risk of overfitting of the tree and provides more accuracy level of an individual tree and path. An application example presents how the proposed DSSA can be applied to predict RUL and maintenance action of the component. The DSSA also takes care of the data dimensions and units at the time of data integration and uses a validator before feeding in for core algorithm.

The future work is to compare all possible classification methods as per the data behaviour and comparing the features extraction methods. The combination of all the process methodologies can show a generic architecture and can also be presented as a graphical user interface.

Acknowledgement

The presented research activities have received funding from the Knowledge Foundation (KKS), Volvo Cars Corporation (VCC), Eurofins and Autokaross i Floby under the research project Efficient Equipment Engineering (E³) in University of Skövde.

References

[1] Huynh KT, Castro IT, Barros A, Bérenguer C. On the use of mean residual life as a condition index for condition-based maintenance decision-making.

IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014;

44(7): 877–93.

[2] Do P, Voisin A, Levrat E, Iung B. A proactive condition-based maintenance Strategy with both perfect and imperfect maintenance actions. Reliability Engineering & System Safety, 2015; 133: 22–32.

[3] Nguyen KA, Do P, Grall A. Multi-level predictive maintenance for multi- component systems. Reliability Engineering & System Safety, 2015; 144:

83-94.

[4] Traore M, Chammas A, Duviella E. Supervision and prognosis architecture based on dynamical classification method for the predictive maintenance of dynamical evolving systems. Reliability Engineering & System Safety, 2015; 136: 120-131.

[5] Purarjomandlangrudi A, Ghapanchi AH, Esmalifalak M. A data mining approach for fault diagnosis: An application of anomaly detection algorithm. Measurement, 2014; 55: 343-352.

[6] Xu J, Sun K, Xu L. Integrated system health management-oriented maintenance decision-making for multi-state system based on data mining. International Journal of Systems Science, 2016; 47(13): 3287-3301. [7] Confalonieri M, Barni A, Valente A, Cinus M, Pedrazzoli P. An AI based

decision support system for preventive maintenance and production optimization in energy intensive manufacturing plants. IEEE International Conference on Engineering, Technology and Innovation/ International Technology Management Conference (ICE/ITMC), 2015; 1-8.

[8] Bumblauskas D, Gemmill D, Igou A, Anzengruber J. Smart Maintenance Decision Support Systems (SMDSS) based on Corporate Big Data Analytics. Expert Systems With Applications, 2017; doi: 10.1016/j.eswa.2017.08.025

[9] Trincavelli M, Coradeschi S, Loutfi A, Odour classification system for continuous monitoring applications. Sensors and Actuators B: Chemical, 2009, 139(2): 265-273.

[10] Liu H, Motoda H. Computational Methods of Feature Selection. Chapman

& Hall/Crc Data Mining and Knowledge Discovery Series, 2008. [11] Mosallam A, Medjaher K, Zerhouni N. Data-driven prognostic method

based on Bayesian approaches for direct remaining useful life prediction. Journal of Intelligent Manufacturing, 2016; 27: 1037-1048.

[12] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York: Springer; 2009. [13] Mitchell T. Machine Learning New York: McGraw-Hill; 1997.

[14] Witten IH, Frank E. Data Mining: Practical Machine Learning, Tools and Techniques, Boston: Morgan Kaufman; 2005.

[15] Pang-Ning Tan PN, Steinbach M, Kumar V. Introduction to Data Mining. Pearson Addison-Wesley; 2006.

[16] Guenther F, Fritsch S. Neuralnet: training of neural networks. R Journal, 2012; 2: 30—38.

[17]Cabrera D, Sancho F, Sánchez RV, Zurita G, Cerrada M, et. al. Fault diagnosis of spur gearbox based onrandom forest and wavelet packet decomposition. Frontiers of Mechanical Engineering, 2015; 10: 277—286. [18]Yang BS, Di X., Han T. Random forests classifier formachine fault

diagnosis. Journal of Mechanical Science and Technology, 2008: 22: 1716—1725.

(6)

suited to evaluate the tree’s and path’s accuracy. A skeleton decision tree algorithm is explained in [15].

The RF model is the tree-type classifier, created by grouping the trees for classification and regression (CART) and introduced by Breiman [16]. Each tree classifier is named a class predictor. A large number of trees make one RF from a sub-dataset. The distinguishing training data is retrieved by using bagging. The bagging process means selecting random samples with a replacement for stability and classification accuracy. It reduces variance and assists to avoid over-fitting.

The final decision made by summing the votes of class predictors and then choosing the winner class in terms of the number of votes. RF performance is measured by a metric called out of bag error (oob-error) calculated as the average of the rate of the error in each weak learner [17, 18].

In the current study, 10-fold RF model is operated and each fold includes 1000 trees. The accuracy values are compared among all the folds and the best fold is chosen out. Later, the chosen fold will describe the paths and their corresponding accuracies, which will provide extracted knowledge in the terms of paths consisting of features, their split value and the window that path belongs to. The features described in the path are the important features and the split values become the threshold value of each feature. The window number indicated in the path explains the health status, which is further used by the domain experts to perform appropriate maintenance actions.

3.6. Visualization

The extracted knowledge is of no good importance if it is not interpreted well. Visualization is one of the methods to understand the results. Like in fig. 2, trends in the features (variables) of ball screw can be understood well. The results like important decision trees, important paths and important variables in the paths (fig. 3.). Visualizing at-a-glance information about which classes are close, which classes differ, find clusters within classes and find easy/hard/unusual cases can accelerate perception, provide insight and control, and harness this flood of valuable data to gain a competitive advantage in DSS.

Fig. 3. Decision tree visualization

Fig. 3. Shows all the paths in one decision tree with their health status indicator. An important path with its split values has been exemplified. Such type of visualization is easy to

interpret and more useful to understand the relationships among many variables for future predictions.

3.7. Inference Engine

As an important part of DSS, common methods used for inference engine mainly include important paths, and decision tree results. DSSA along with data mining methods proposed in this paper is based on following ideas:

• The aim of the DSSA is the non-trivial process of discovering new knowledge in the huge database, this means that the main difficulty of the support system to train the data mining models in the architecture, and inferences are the assistance to validate the models.

• DSSA can extract the knowledge at a deeper level. More specifically, discovering other relations based on the existing variables and existing relations can help to mine more useful results for future predictions. The last stage is shown in DSSA (fig. 1.) that asks for analysis for predicting maintenance clarifies the requirement of extracting more in- depth knowledge.

• Because data themselves may contain some characteristics such as uncertainty, non-monotony, incompletion, etc., the process of DSS may also be complex with multi-solutions.

However, in this situation, multi-solution can be a benefit for the domain expert, as it provides more alternatives to be pursued.

• The DSSA should be novel, potentially beneficial, operative, effective and understandable to users.

From the above description, we can see that the nature of DSSA is a machine learning process. Its aim is to obtain deeper knowledge. The resource of learning is a database and results from the data mining models in DSSA.

4. Application Example

The prediction of remaining useful life (RUL) of the machine or component is affected by numerous sources of machine process and operating conditions. It is important to interpret these sources in order to facilitate significant decision support. The prediction of RUL and maintenance actions depends on collected data, feature calculations, operating conditions, threshold setting and prediction algorithms.

The proposed DSSA assists in learning the relationship among collected data, features calculation, threshold setting on the basis of features’ measured data which represents operating condition of the ball-screw. The main concept of the DSSA is to anticipate the manifestation of the possible failure by classifying operational data in health status windows along with threshold values of the features in order to perform proactive maintenance action. Hence, current DSSA is strongly related to the prediction of RUL and maintenance action.

The classification of operational data into health status windows is performed based the life of a ball screw and ball- bar measurement (BBM) data. The features measured during BBM represent the operational condition of the ball-screw. The core algorithm in DSSA discovers important paths, the features included in the paths, their split value (threshold value) and the health status that path is indicating. Based on the results from DSSA, prediction can be made for each health status window, which is the reflection of the type of operational program is run on the machine, that directly connects to predicting RUL and maintenance action of the component.

5. Conclusion

In this paper, a DSSA based on DM is proposed. The architecture integrates several DM techniques to extract important information in terms of important features and the important paths in the decision trees. The core algorithm in the architecture builds on the selection of interesting features in two-window comparison approach from the input offline measurements. It constructs representative features that can be used to explain health status of the component. The extracted features then used to in classification methods to evaluate the accuracy of classes with respect to the paths retrieved from the data mining classification methods. As an example decision tree (DT) and random forest (RF) classification methods are proposed. RF is considered better than DT, as it reduces the risk of overfitting of the tree and provides more accuracy level of an individual tree and path. An application example presents how the proposed DSSA can be applied to predict RUL and maintenance action of the component. The DSSA also takes care of the data dimensions and units at the time of data integration and uses a validator before feeding in for core algorithm.

The future work is to compare all possible classification methods as per the data behaviour and comparing the features extraction methods. The combination of all the process methodologies can show a generic architecture and can also be presented as a graphical user interface.

Acknowledgement

The presented research activities have received funding from the Knowledge Foundation (KKS), Volvo Cars Corporation (VCC), Eurofins and Autokaross i Floby under the research project Efficient Equipment Engineering (E³) in University of Skövde.

References

[1] Huynh KT, Castro IT, Barros A, Bérenguer C. On the use of mean residual life as a condition index for condition-based maintenance decision-making.

IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014;

44(7): 877–93.

[2] Do P, Voisin A, Levrat E, Iung B. A proactive condition-based maintenance Strategy with both perfect and imperfect maintenance actions. Reliability Engineering & System Safety, 2015; 133: 22–32.

[3] Nguyen KA, Do P, Grall A. Multi-level predictive maintenance for multi- component systems. Reliability Engineering & System Safety, 2015; 144:

83-94.

[4] Traore M, Chammas A, Duviella E. Supervision and prognosis architecture based on dynamical classification method for the predictive maintenance of dynamical evolving systems. Reliability Engineering & System Safety, 2015; 136: 120-131.

[5] Purarjomandlangrudi A, Ghapanchi AH, Esmalifalak M. A data mining approach for fault diagnosis: An application of anomaly detection algorithm. Measurement, 2014; 55: 343-352.

[6] Xu J, Sun K, Xu L. Integrated system health management-oriented maintenance decision-making for multi-state system based on data mining.

International Journal of Systems Science, 2016; 47(13): 3287-3301.

[7] Confalonieri M, Barni A, Valente A, Cinus M, Pedrazzoli P. An AI based decision support system for preventive maintenance and production optimization in energy intensive manufacturing plants. IEEE International Conference on Engineering, Technology and Innovation/ International Technology Management Conference (ICE/ITMC), 2015; 1-8.

[8] Bumblauskas D, Gemmill D, Igou A, Anzengruber J. Smart Maintenance Decision Support Systems (SMDSS) based on Corporate Big Data Analytics. Expert Systems With Applications, 2017; doi:

10.1016/j.eswa.2017.08.025

[9] Trincavelli M, Coradeschi S, Loutfi A, Odour classification system for continuous monitoring applications. Sensors and Actuators B: Chemical, 2009, 139(2): 265-273.

[10] Liu H, Motoda H. Computational Methods of Feature Selection. Chapman

& Hall/Crc Data Mining and Knowledge Discovery Series, 2008.

[11] Mosallam A, Medjaher K, Zerhouni N. Data-driven prognostic method based on Bayesian approaches for direct remaining useful life prediction.

Journal of Intelligent Manufacturing, 2016; 27: 1037-1048.

[12] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning:

Data Mining, Inference, and Prediction, New York: Springer; 2009.

[13] Mitchell T. Machine Learning New York: McGraw-Hill; 1997.

[14] Witten IH, Frank E. Data Mining: Practical Machine Learning, Tools and Techniques, Boston: Morgan Kaufman; 2005.

[15] Pang-Ning Tan PN, Steinbach M, Kumar V. Introduction to Data Mining.

Pearson Addison-Wesley; 2006.

[16] Guenther F, Fritsch S. Neuralnet: training of neural networks. R Journal, 2012; 2: 30—38.

[17]Cabrera D, Sancho F, Sánchez RV, Zurita G, Cerrada M, et. al. Fault diagnosis of spur gearbox based onrandom forest and wavelet packet decomposition. Frontiers of Mechanical Engineering, 2015; 10: 277—286.

[18]Yang BS, Di X., Han T. Random forests classifier formachine fault diagnosis. Journal of Mechanical Science and Technology, 2008: 22:

1716—1725.