Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-
block variable selection
VIP OPLS , VIP O2PLS , and MB-VIOP methods
Beatriz Galindo-Prieto
Akademisk avhandling
Som med vederbörligt tillstånd av Rektor vid Umeå universitet för avläggande av filosofie doktorsexamen framläggs till offentligt försvar vid Kemiska institutionen, Umeå universitet, KBC-huset, KB.E3.01, onsdag den 15 februari, kl. 13:00. Avhandlingen kommer att försvaras på engelska.
Fakultetsopponent: Prof Olav Kvalheim
Kemiska institutionen, Bergen universitet, Norge
Kemiska institutionen / Department of Chemistry Umeå universitet / Umeå University
Umeå, 2017
Organization Document type Date of publication Umeå University Doctoral thesis 15 February 2017 Author
Beatriz Galindo-Prieto
Language ISBN Number of pages
English 978-91-7601-620-6 103 + 4 papers Title
Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-block variable selection – VIPOPLS, VIPO2PLS, and MB-VIOP methods.
Abstract
Multivariate and multiblock data analysis involves useful methodologies for analyzing large data sets in chemistry, biology, psychology, economics, sensory science, and industrial processes; among these methodologies, partial least squares (PLS) and orthogonal projections to latent structures (OPLS®) have become popular. Due to the increasingly computerized instrumentation, a data set can consist of thousands of input variables which contain latent information valuable for research and industrial purposes. When analyzing a large number of data sets (blocks) simultaneously, the number of variables and underlying connections between them grow very much indeed; at this point, reducing the number of variables keeping high interpretability becomes a much needed strategy.
The main direction of research in this thesis is the development of a variable selection method, based on variable influence on projection (VIP), in order to improve the model interpretability of OnPLS models in multiblock data analysis. This new method is called multiblock variable influence on orthogonal projections (MB-VIOP), and its novelty lies in the fact that it is the first multiblock variable selection method for OnPLS models.
Several milestones needed to be reached in order to successfully create MB-VIOP. The first milestone was the development of a single-block variable selection method able to handle orthogonal latent variables in OPLS models, i.e. VIP for OPLS (denoted as VIPOPLS or OPLS- VIP in Paper I), which proved to increase the interpretability of PLS and OPLS models, and afterwards, was successfully extended to multivariate time series analysis (MTSA) aiming at process control (Paper II). The second milestone was to develop the first multiblock VIP approach for enhancement of O2PLS® models, i.e. VIPO2PLS for two-block multivariate data analysis (Paper III). And finally, the third milestone and main goal of this thesis, the development of the MB-VIOP algorithm for the improvement of OnPLS model interpretability when analyzing a large number of data sets simultaneously (Paper IV).
The results of this thesis, and their enclosed papers, showed that VIPOPLS, VIPO2PLS, and MB- VIOP methods successfully assess the most relevant variables for model interpretation in PLS, OPLS, O2PLS, and OnPLS models. In addition, predictability, robustness, dimensionality reduction, and other variable selection purposes, can be potentially improved/achieved by using these methods.
Keywords
Variable influence on projection, VIP, MB-VIOP, orthogonal projections to latent structures, OPLS, O2PLS, OnPLS, variable selection, variable importance in multiblock regression.