Module PLS regression provides the user with one of the best computational tools for evaluating a pair of multidimensional variables, which is expected to have linear relationship inside one or the other multidimensional variable, and linear relationship between the two variables with each other. This computationally intensive methodology allows to explain and predict one of the variables using other group of variables. The PLS regression method found a large number of applications in the planning and management of quality in manufacturing technology, design and optimization of the characteristics of products in the development of new products, marketing studies, research in the evaluation of experiments, in clinical trials. An example might be modeling the relationship between technological parameters in the production and product quality parameters, or between the chemical composition and physical and biological characteristics. The typical questions of technological practice, which PLS can often answer include:
It has a purity of the raw material any effect on the strength of the product?
What happens if the temperature is increased in the process?
Can we increase the stability of the product by reducing the speed or rotation?
Which process parameters affect the most product strength?
How to set the value of procedural parameters to achieve the desired product characteristics?
What caused the decrease in the parameter?
In what and how subsequent production batches differ?
How to improve the stability / quality?
How to increase the strength / value / competitiveness?
Which input parameters are crucial for the quality?
Which process parameters are crucial for the quality?
Partial least squares - Pdf manual
Data and parameters
Module PLS regression needs two data tables, matrix X with p columns and Y with q columns selected as the dialog box items Matrix X and Matrix Y. The matrix columns must contain numeric data only, the number of rows must be the same for both X and Y. Each matrix must contain at least two columns. Columns of matrix X must not appear in the matrix Y. The limiting dimension k can be set by user. If the box Dimension is not checked, the maximum dimension is set to k = min(p, q). It is recommended to perform PLS in maximal dimension first, then optionally we can determine an appropriate value of k using the scree plot (see paragraph Graphical output below) and repeat the computation again with new k. If the checkbox Connect Biplot is checked, consecutive points of the Biplot will be connected in the order of data in spreadsheet. This can help to follow a possible trajectory of the process. If the checkbox X-Prediction is checked, it is necessary to choose the same number of columns as in the field Independent variable X. These variables will be used to compute the predicted values of the dependent variable. The X for the prediction must have the same number of columns as the independent variable matrix X, but may have a different number of lines (at least 2 lines).
Dialog box for PLS regression
|
|
|
X (n x p)
|
Y (n x q)
|
X for prediction (n1 x p)
|
Protocol
Input data |
|
No of rows |
Number of valid rows |
|
|
No of columns |
Number of columns of X a Y matrices. |
Columns |
Column names of both input matrices. |
|
|
Chosen dimension |
Dimension k for the PLS model chosen in the input dialog window.The dimension must be less or equal to min(p, q). Scree plot may be used as an aid to select suitable k if required. |
|
|
PLS - coefficients, B |
Diagonal elements of the matrix B. |
|
|
Explained sum of squares |
Table of the squares sum of residuals with growing dimension of the model, i = 1, … k, these values are used for constructing scree plot. |
No of components |
Number of components (dimensions) used for the squares sum. |
|
|
RSS |
Residual square sum value, for 0 components the RSS is the total squares sum without a model. |
Percent |
% of the RSS |
Explained % |
(100 - %RSS). |
Loadings X, P |
Loadings matrix P. |
Loadings Y, Q |
Loadings matrix Q. |
Regression coefficients, A |
Matrix of regression coefficients aij formally similar to those in the separate classical miltiple linear regression models Y = XA, or yj = ?aijxi. The coefficient values are generally different from the classical coefficients, since they are based on the orthogonal component regression and therefore they are biased, shortened (with lower standard deviations) and more stable. |
Prediction |
Predicted values for the data selected in the fielf X-Prediction in the PLS dialog box. This part of output is not generated unless the checkbox is checked. |
Graphical output
|
Bi-plot for the matrices X and Y in one plot, matrix X and Y separately. Biplot is a projection of multidimensional data in the plane (the best one in terms of least squares). Points represent rows, rays correspond to columns of the original data. To identify the data rows you can use the labels of points selected in the dialog. Close vector lines (rays) are likely to be mutually correlated. Points located in the direction of a ray will have bigger value of the respective variable. You should be aware that, due to the drastic reduction in the number of dimensions, particularly for larger p,q this information and guidance is rather a global assessment of the structure and possible links and relationships in the data. If the checkbox Connect Biplot was checked, the points in the plot are linked in chronological order, which sometimes makes it possible to identify and spot trends in the time-series or non-stationary process.
|
|
Plot of agreement between the columns of T and U. This graph shows the global success of a PLS model fiting. The closer the points to the line, the more successful a PLS model is.
|
|
The effectiveness of the model expressed by reduction of unexplained (residual) sum of squares, depending on the number of factors included (columns of matrix T and U).
|
|
This plot expresses compliance of dependent variables and model prediction. The closer are the points to the line, the better the fit. This plot is created for each dependent variable. Some variables can be predicted better, others worse. If the plot does not show a visible trend, the suitable model for this vartiable was probably not found, a model is not able to predict dependent variable. If Validation checkbox was selected, the validation (test) points in the plot are marked in red.
|
|
Plot Validation is used to assess the quality of prediction of validation data. Unique very remote points may represent outlying measurements. On the Y-axis are Eukleidian distances of the data from the model.
|
|