I was reading the book Elements of Statistical Learning and came across the section that tried to interpret ridge regression using singular value decomposition (SVD) of the design matrix, $X$. Specifically, I found the following:
$X=UDV^{T}$, where matrix $U$ is $N\times p$, $V$ is a $p\times p$ orthogonal matrix, and $D$ is a $p\times p$ diagonal matrix.
I am confused because from Wikipedia, the orthogonal matrix has to be a square matrix. In this case matrix $U$ does not qualify. Later I tend to believe that $U$ contains orthogonal columns only, and that results in $U^{T}U=I$, but $UU^{T}\ne I$. This seems to make sense because I found in the book
$X \hat{\beta}=X(X^{T}X)^{-1}X^{T}Y=UU^{T}Y$, and $UU^{T}Y$ should not be equal to $Y$
So my question becomes: are there two versions of SVD I can do? One results in both $U$ and $V$ being orthogonal and square matrix, and the other like this? Or is there anything wrong with my argument?
Any guidance is appreciated.
Update after receiving initial answer:
After reading @BabakP 's answer, I thought testing the algorithm using software is a good idea. So I tried svd function in Matlab. The result shows a square U matrix in dimension NxN, a diagonal matrix D in dimension Nxp, and a square V matrix in dimension pxp. Example below:
A=[ones(10,1) randn(10,1)];
[U,S,V]=svd(A);
>> size(U)
ans =
10 10
>> size(S)
ans =
10 2
>> size(V)
ans =
2 2
So does this mean R and Matlab give two different versions?
