• No results found

Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi- block variable selection

N/A
N/A
Protected

Academic year: 2022

Share "Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi- block variable selection"

Copied!
2
0
0

Loading.... (view fulltext now)

Full text

(1)

Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-

block variable selection

VIP OPLS , VIP O2PLS , and MB-VIOP methods

Beatriz Galindo-Prieto

Akademisk avhandling

Som med vederbörligt tillstånd av Rektor vid Umeå universitet för avläggande av filosofie doktorsexamen framläggs till offentligt försvar vid Kemiska institutionen, Umeå universitet, KBC-huset, KB.E3.01, onsdag den 15 februari, kl. 13:00. Avhandlingen kommer att försvaras på engelska.

Fakultetsopponent: Prof Olav Kvalheim

Kemiska institutionen, Bergen universitet, Norge

Kemiska institutionen / Department of Chemistry Umeå universitet / Umeå University

Umeå, 2017

(2)

Organization Document type Date of publication Umeå University Doctoral thesis 15 February 2017 Author

Beatriz Galindo-Prieto

Language ISBN Number of pages

English 978-91-7601-620-6 103 + 4 papers Title

Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-block variable selection – VIPOPLS, VIPO2PLS, and MB-VIOP methods.

Abstract

Multivariate and multiblock data analysis involves useful methodologies for analyzing large data sets in chemistry, biology, psychology, economics, sensory science, and industrial processes; among these methodologies, partial least squares (PLS) and orthogonal projections to latent structures (OPLS®) have become popular. Due to the increasingly computerized instrumentation, a data set can consist of thousands of input variables which contain latent information valuable for research and industrial purposes. When analyzing a large number of data sets (blocks) simultaneously, the number of variables and underlying connections between them grow very much indeed; at this point, reducing the number of variables keeping high interpretability becomes a much needed strategy.

The main direction of research in this thesis is the development of a variable selection method, based on variable influence on projection (VIP), in order to improve the model interpretability of OnPLS models in multiblock data analysis. This new method is called multiblock variable influence on orthogonal projections (MB-VIOP), and its novelty lies in the fact that it is the first multiblock variable selection method for OnPLS models.

Several milestones needed to be reached in order to successfully create MB-VIOP. The first milestone was the development of a single-block variable selection method able to handle orthogonal latent variables in OPLS models, i.e. VIP for OPLS (denoted as VIPOPLS or OPLS- VIP in Paper I), which proved to increase the interpretability of PLS and OPLS models, and afterwards, was successfully extended to multivariate time series analysis (MTSA) aiming at process control (Paper II). The second milestone was to develop the first multiblock VIP approach for enhancement of O2PLS® models, i.e. VIPO2PLS for two-block multivariate data analysis (Paper III). And finally, the third milestone and main goal of this thesis, the development of the MB-VIOP algorithm for the improvement of OnPLS model interpretability when analyzing a large number of data sets simultaneously (Paper IV).

The results of this thesis, and their enclosed papers, showed that VIPOPLS, VIPO2PLS, and MB- VIOP methods successfully assess the most relevant variables for model interpretation in PLS, OPLS, O2PLS, and OnPLS models. In addition, predictability, robustness, dimensionality reduction, and other variable selection purposes, can be potentially improved/achieved by using these methods.

Keywords

Variable influence on projection, VIP, MB-VIOP, orthogonal projections to latent structures, OPLS, O2PLS, OnPLS, variable selection, variable importance in multiblock regression.

References

Related documents

1) The total MB-VIOP profile for the total variation of the OnPLS model (i.e., unique + local + global), which identifies the variables that are more important

Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi- block variable selection. VIP OPLS , VIP O2PLS , and

As we mentioned briefly earlier, the variable similarity measure takes into account: (1) how similar the two domain transition graphs are and (2) how similar the adjacent variables

As larger resolution requires larger SRI, the cost of combining shading rates and setting up the shading rate image in the application adversely affects performance, in this case,

Table 3 contains the median mean square error (M M SE), average number of correct zero and incorrect zero coefficients together with the fitting method, true model, model

Rotations can be represented in many different ways, such as a rotation matrix using Euler angles [Can96], or as (multiple) pairs of reflections using Clifford algebra [Wil05]

In this study it has been demonstrated that when applying variable selection algorithms to logistic regression with a dataset consisting of few independent variables then

With the same speed limit displayed with the VSL as previously with a permanent road sign (blue bar), the average speed at all crossings dropped by 1 – 7 km/h.. With an increase