In partial least squares regression, what is the difference between the regression coefficients and the loadings for each independent variable in each component? Specifically, I understand in evety component, each of the independent variables has a coresponding loading. Does each variable also have a regression coefficient? What is the relationship between the loading vector and the coefficients?
1 Answers
Assuming your independent variable matrix is $m\times n$, that you have $m$ observations and $n$ variables.
For each PLS component (AKA latent variable), you get a loading vector ($n \times 1$), so for $h$ components the size of loading matrix ($P$) is $n \times h$. These loadings are calculated for both interpretation and algorithmic purposes but they have no use for prediction.
On the other hand, SIMPLS algorithm (I believe the most popular PLS flavor) also involves calculation of weight matrix ($W$), which has the same size as loading matrix. This orthogonal matrix $W$ is used to calculate $X$ scores ($T$):
$T = X\cdot W$
which is then multiplied by $Y$ loadings ($Q$) for prediction:
$\hat{Y} = T \cdot Q'$
Therefore, the regression coefficients ($\hat{B}$ that is $n\times1$ for a single dependent variable) that can be used to predict $Y$ directly from $X$ can be calculated:
$\hat{B} = W \cdot Q'$
All in all, one obtains a loading vector for each component whereas for different number components a same sized yet different regression coefficients are produced.
As far as I know, a similar logic applies to other PLS algorithms too.
- 1,532
- 1
- 12
- 23
-
Thanks for the reply. But I'm still not clear if a regression coefficient is generated for every independent variable - i.e, all $m$ of them, or only a subset. – user2450223 May 07 '19 at 10:06
-
Yes, a coefficient for each variable is obtained. But not m but n of them since n is the number of variables. – gunakkoc May 07 '19 at 10:30
-
I suggest re-reading the answer with looking at the matrix/vector sizes carefully, it should help clear things up. – gunakkoc May 07 '19 at 10:32
-
Thanks for the detailed reply. Just one more detail: in your example, what is the dimension of the matrix Q? And Y-hat? I'm asking because I have output from a PLS regression and I have a set of regression coefficients as well as loadings. I get that the coefficients is B-hat and the loading P is n x h. – user2450223 May 07 '19 at 12:30
-
I guess Q should be $h$ x 1. Also what is the difference between Q and Q'? – user2450223 May 07 '19 at 12:35
-
Q is 1 x h. Q' ıs Q transposed. So, Q' is h x 1 – gunakkoc May 07 '19 at 13:55
-
Thanks again very much. Intuitively, what information is expressed by the loadings versus the regression coefficients? Can the regression coefficients be taken as a measure of the variable's relative importance? – user2450223 May 07 '19 at 14:54
-
Let us continue this discussion in chat. – user2450223 May 07 '19 at 16:15