Why is the weight vector in PLS constrained to be of unit length?

Question

In the SIMPLS formulation of partial least squares (PLS) regression, the weights are constrained to have length of 1,

$$r_a^Tr_a = 1,$$ where $a$ represents a latent component (from $1$ to $A$). This is from the original definition by de Jong (1993).

As the plsgenomics package says in the help for the pls.regression function:

In the original definition of SIMPLS by de Jong (1993), the weight vectors have length 1. If the weight vectors are standardized to have length 1, they satisfy a simple optimality criterion (de Jong, 1993).

I read de Jong's paper and he says:

For the development of the theory and algorithm of SIMPLS it was convenient to choose normalized weight vectors $r_a$. This choice, however, is in no way essential.

My question is this. Can anyone give me an intuition for what effect constraining the length of the weight vector has on the algorithm?

References:

de Jong, S. SIMPLS: an Alternative Approach To Partial Least-Squares Regression. Chemom. Intell. Lab. Syst. 18, 251–263 (1993).

Take something simpler than PLS, e.g. PCA. Do you know why eigenvectors in PCA are constrained to unit length? — amoeba, Mar 24 '15 at 23:29
Thank you for your helpful comment. I don't know why in this simpler case either. — Stefan Avey, Mar 25 '15 at 12:34
Then see here: http://stats.stackexchange.com/questions/117695. — amoeba, Mar 25 '15 at 12:35
Great @amoeba. If you post this link as an answer I will accept it (or if you want to close this as duplicated that's ok too). Thanks! — Stefan Avey, Mar 25 '15 at 15:20
@amoeba, I tried to upvote but I'm so new that I can't (you need 15 reputation first). Probably shouldn't have closed as a duplicate - my fault. You'll get my upvote once I can ;) — Stefan Avey, Mar 27 '15 at 00:36

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

It is instructive to consider a simpler case of PCA first.

Given data matrix $\mathbf X$, PCA finds a direction $\mathbf w$ such that the variance of the projection is maximized: $$\mathbf w = \mathrm{argmax}\; \mathrm{Var}(\mathbf X \mathbf w).$$ To be able to search for the optimal direction, one needs to fix the length of $\mathbf w$, otherwise the variance can be made arbitrarily large by increasing its length. Constraining the length of $\mathbf w$ to any number $\alpha$ would work fine, but it is particularly convenient to choose $\|\mathbf w\|=1$, because only then is $\mathbf X \mathbf w$ a projection on the direction of $\mathbf w$.

See this thread for more details: Why is the eigenvector in PCA taken to be unit norm?

Now turning to PLS, we will see that the situation is exactly the same. Given data matrix $\mathbf X$ and a response variable $\mathbf y$, PLS looks for a direction $\mathbf w$ such that the covariance of $\mathbf y$ and the projection of $\mathbf X$ onto $\mathbf w$ is maximized: $$\mathbf w = \mathrm{argmax}\; \mathrm{Cov}(\mathbf X \mathbf w, \mathbf y).$$ Again, for this formulation to make sense, one needs to fix the length of $\mathbf w$ and it is convenient to fix it to $1$.

Why is the weight vector in PLS constrained to be of unit length?

1 Answers1