What is the meaning of the variable "scores" in MATLAB's PCA?

Question

I was trying to understand what the score variable was in MATLAB. The PCA documentation says:

Principal component scores are the representations of X in the principal component space. Rows of score correspond to observations, and columns correspond to components.

What I find confusing the following:

scores are the representations of X in the principal component space.

since I am not sure what that means precisely. For me (at least from a auto-encoding perspective) the representation of the data $X_N \in \mathbb{R}^{D \times N}$ in the principal component space would be the projection of all the data set points $X_N$ (where the data set points are the columns) on the column space of $U$, the eigenvectors of the covariance matrix $C_N = \frac{1}{N} \sum^{N}_{n=1} (x^{(n)} - \bar{x}) ({x^{(n)}} - \bar{x})^T = \frac{1}{N} (X-\bar{X})(X - \bar{X})^{T}$. Therefore, score should be the best linear combination of the principal components $U$.

For one single data vector $x^{(i)}$ one can notice the following:

$$ a^{(a)} = \left( \begin{array}{c} u^T_1 x^{(n)}\\ \vdots \\ u^T_k x^{(n)}\\ \vdots \\ u^T_K x^{(n)} \end{array} \right)=U^Tx^{(n)}$$

produces the coefficients of projections onto each principal component. Thus, each component $k$ of $a^{(i)}_k$ tells you how much the data $x^{(i)}$ projects on the direction of eigenvector $u_k$. Thus one can reconstruct one single data as follows:

$$\tilde{x}^{(i)} = \sum^{K}_{k=1} a^{(i)}_k u_k = U a^{(i)} = U U^T x^{(i)} $$

From the above its not to hard to see that the following equation will reconstruct the whole data matrix $X_N$:

$$ \tilde{X}_N = U U^T X_N$$

therefore what occurred to me to understand what the variable score actually represents was to compare it with the above equation. Thus, I wrote the following script that does exactly that:

D = 3
N = 5
X = rand(D, N);
%% process data
x_mean = mean(X, 2); %% computes the mean of the data x_mean = sum(x^(i))
X_centered = X - repmat(x_mean, [1,N]);
%% PCA
[coeff, score, latent, ~, ~, mu] = pca(X'); % coeff =  U
[U, S, V] = svd(X_centered); % coeff = U
%% Reconstruct data
X_tilde_U = U * U'*X
X_tilde_coeff = coeff*coeff'*X
score % unfortunately not the same as the above matrices

unfortunately, I discovered that score was not the same as $\tilde{X}^{(i)}$. What is it though? Thus, the points that I wanted to address were:

What does score actually represents? What is a mathematical and intuitive explanation of what it is?
If I want to use PCA as the tool to reconstruct vectors (or say images) as in a linear auto-encoder (aka PCA) should I use the variable score or should I use what I understand as a reconstruction $ \tilde{X}_N = U U^T X_N$?

After doing some more digging in that documentation I found that one can make what I call a reconstruction with the following code:

X_tilde_score = ( score * coeff' + + repmat(mu,[N,1]) )';

Which translates in equations to:

$$ \tilde{X} = (score U^T + \bar{X})^T$$

where $\bar{X}$ is the concatenation of the mean vector $\bar{x} = \frac{1}{N} \sum^N_{i=1} x^{(i)}$.

After some rearranging one can get:

$$ scores = U^T (\tilde{X} - \bar{X}) = U^T(X - \bar{X})$$

which seems a little weird to me because that is not what I would have called "representations of X in the principal component space". It doesn't even seem to be a projection because it does not even obey $P^2 = P$ (since $U^TU^T$ doesn't make sense as its rectangular). Then I was wondering what were the developers thinking when they defined scores? Why would returning such a thing be good instead of $\tilde{X}$? Is there something about PCA I don't know or that I don't understand and hence, why I miss the purpose of score? Why is it meaningful to define scores that way? (I don't think they "wrong" or its a bad definition, I genuinely want to understand the motivation for such a definition for score)

If it helps to understand my perspective (and why I might be asking this for someone who thinks its an obvious answer) I mostly come from a Machine Learning, Linear Algebra and Computer Science background. In particular, I find auto-encoders interesting right now.

In general, the terminology around PCA is somewhat loose. People talk about loadings, weights, scores, components. For instance in Jollifee's excellent reference on Principal Components at the end of section 1.1 it reads: "Some authors distinguish between the terms ‘loadings’ and ‘coefficients,’ depending on the normalization constraint used, but they will be used interchangeably in this book." which clearly messes up things. — usεr11852, Mar 20 '16 at 04:51
PC scores are the values of principal components, the coordinates of observations (rows) of X in the space of the PCs. PC scores can raw (then their variances = eigenvalues) or standardized (the scores with variances scaled to 1). A a flow chart here I showed the paths how these and those can be computed. — ttnphns, Mar 20 '16 at 08:05
Raw scores are computed as USor equivalently as XV (actually, if PCA is performed on the covariance matrix rather than scatter matrix, then sqrt(n)US will stand in place of US, where n is the number of raws). What you are computing in your example is not this. — ttnphns, Mar 20 '16 at 08:17
... note that sqrt(n)U is what usually called standardized PC scores. — ttnphns, Mar 20 '16 at 08:29

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

3

This is a perfectly fine definition based on the resources they cite (eg. Jolliffe, 2002); it is at no point wrong. To your particular questions:

By score they represent the projections $\Xi$ of the centred data in the linear space defined by the eigenvectors $\Phi$. You can immediately check this in your script with something like: all( abs(score) - abs(X_centered' * U) < 2*eps) (I use the abs to ensure we issues with sign).
You can produce the $k$-th dimensional approximation of your centred data by using the score of the $K$ first principal components of them. That is: $\hat{X}^K_{c} = \sum_{i=1}^K \xi_i \phi_i$. Assuming $K=5$ in your script this is plainly (coeff * score') which numerically equates the centred sample: all(abs( X_centered - (coeff * score')) < 2*eps).

I believe that some of your misconception stems from the fact that you say: "score should be the best linear combination of the principal components $U$", but unfortunately this is not the case. The score dictate which it the best linear combination of the principal components $U$ to reconstruct the data in terms of fraction-of-variance-explained but they are not the results of that combination. In terms of PCA, SVD contains only the left singular vectors, $U$ (the eigenvectors of the covariance matrix of $X$) and the singular values S (the square root of the eigenvalues of the covariance matrix of $X$, more information here); nothing about the scores $\Xi$. You will need to project the centred sample $X_c$ using $U$ to get the scores $\Xi$. Inversely, if you know use $\Xi \Phi^T$ you can reconstruct the data back.

To recap: score are the projections of the centred data in the linear space defined by the eigenvectors of the covariance matrix of $X$. This exactly your final result: $\text{scores} = U^T (X - \bar{X})$.

A side-comment: When I started reading on PCA I first try to get the covariance derivation right and then moved to the SVD. I believe that the covariance methodology is a bit easier to follow and somewhat more intuitive in terms of Statistics as well as physical interpretation. Maybe you want to nail that down first and then move to the SVD methodology.

edited Apr 13 '17 at 12:44

Community

1

answered Mar 20 '16 at 04:33

usεr11852

44,125

Maybe its a misconception I have but usually projection matrices (http://mathworld.wolfram.com/ProjectionMatrix.html) are square matrices and obey $P^2 = P$. The matrix $U^T$ is neither square nor does it obey $(U^T)^T = U^T$. Hence on your 3 paragraph, how does it make sense to call $U^T (X - \tilde{X})$ a projection? Thanks for your patience and help :) – Charlie Parker Mar 20 '16 at 04:57
Good observation. The short answer is that every matrix multiplication can be viewed as a projection. It is informal and I used it in this sense, I did not refer to square projection matrices $P$. The long answer is that if $x_c$ is $p$-variate random vector with a multivariate normal or elliptical distribution, and $\xi$ is the orthogonal projection of $x_c$ onto the $q$-dimensional subspace spanned by the first $q$ PCs for $x_c$, then $\xi$ is self-consistent for $x$, where $\xi= U'x_c$. Self-consistency means that the vector of r.vs $\xi$ is self-consistent for $x_c$ if $E[x_c|\xi] = \xi$. – usεr11852 Mar 20 '16 at 05:23
See: Flury & Riedwy Multivariate Statistics. A Practical Approach, Chapt. 8 for a nice practical approach on the matter. Jollifee's Principal Component Analysis, Chapt. 14 take a more formal approach. – usεr11852 Mar 20 '16 at 05:27
Thanks, I will think about your comments a bit more and also read up on that.
Follow up question, if one has a "new" auto-encoder $f(x) = \tilde{x}$ and wants to compare how well it reconstructs images with respect to PCA, is $\tilde{X} = UU^TX$ the right reconstruction to compare if we intend to compare it with PCA? Or should I be comparing $f(x)$ with $Uscore = UU^T(X-\bar{X})$? Thanks again for the help. :)
– Charlie Parker Mar 20 '16 at 05:35
$UU^T = I$ so it is not helpful in the context of $\tilde{X} = UU^TX$. You can compare the reconstructed image based on $k$ PCs of $X$ as shown above ($\Xi\Phi^T$); note that if you use $k = p$ then you will get a perfect reconstruction – usεr11852 Mar 20 '16 at 05:49
Unitary/Orthogonal matrices are defined such that $U^T U = I$. The reverse is not true. Otherwise, we could have perfect reconstructions from PCA even when using few principal components. – Charlie Parker Mar 20 '16 at 05:52
Please recheck the definition of an orthogonal matrix, $U^TU = UU^T = I$. – usεr11852 Mar 20 '16 at 05:55
yes you are right. I meant to say subsampled U (i.e. not the full one). – Charlie Parker Mar 20 '16 at 16:20

What is the meaning of the variable "scores" in MATLAB's PCA?

1 Answers1

Linked