How to treat large (5K-10K) non-positive-definite (particularly near-singular) covariance matrices for Cholesky decomposition?

Question

I have a very large covariance matrix (around 10000x10000) of returns, which is constructed using a sample size of 1000 for 10000 variables. My goal is to perform a (good-looking) Cholesky decomposition of this matrix. However, as expected, this matrix is near-singular with very small ( < 10^-10 ) eigenvalues (around 5K-6K out of 10K).

I tried a naive approach; doing an eigenvalue decomposition, and setting all the eigenvalues that are less than 10^-10 to 10^-10. In other words, if eigenvalue < 10^-10, then eigenvalue = 10^-10. After that, I reconstructed the matrix with the modified eigenvalue matrix. However, although I can perform Cholesky decomposition, it is very unstable.

What is the best way to handle this PD approximation for large matrices?

When you say " However, as expected, this matrix is near-singular with very small ( < 10^-10 ) eigenvalues (around 5K-6K out of 10K)." -- did you use the sample covariance matrix to estimate the covariance matrix? If so, perhaps seek a different way to estimate it, to ensure the estimated covariance matrix will be invertible. (It's well known that the sample covariance matrix performs badly when its dimension is large.) — Kian, Dec 04 '13 at 16:38
Thanks fushsialatitude. I used both the sample covariance matrix and the exponentially weighted moving average covariance matrix, and had the same problem. Note that since there are more risk factors (10K) than the observations (1K), there are a lot of linear dependencies. — acmh, Dec 04 '13 at 16:44
IMHO, trying to estimate about 50 million parameters from only 1000 observations does not seem to be very wise. First of all, if there are linear dependencies, the matrix will be singular. If you fiddle around with the spectrum (by your method or shrinkage), this property will definitely be lost. You should try to reduce the dimension here. There are numerous ways to do this, all of them have their weaknesses. For a start, you could pool those returns with similar properties into asset classes and the problem becomes more feasible... — vanguard2k, Dec 05 '13 at 11:32
As you see from my answer I agree with @vanguard2k: there are too many variables and too little observations. If you use a weighted estimator, then you reduce the number of observations, which does not improve the situation. The sample covariance matrix is singular if there are more variables than observations (even if it is just one more). In your case it is a facor of 10. — Richi Wa, Dec 05 '13 at 15:24
Thanks a lot Richard and vanguard2k. PD approximation refers to approximating a NPD matrix with a nearest PD matrix. I was able to get reasonable results using the spectrum modification method when I was testing with smaller set of risk factors (100) with 1000 observations. However, I ran into all these problems when I tried testing larger sets of risk factors with 5000-10000 factors. — acmh, Dec 05 '13 at 15:40
@Richard: What do you mean by "For a start, you could pool those returns with similar properties into asset classes" ? — acmh, Dec 05 '13 at 15:55
@acmh I said that. Basically, I suggested to use some kind of dimension reduction techniques. Instead of including all of those highly correlated returns you could take, say 1000 of them and use the returns of a corresponding index or sector instead. Probably, the derivations from the index returns would be about equal the estimation error. Roughly spoken, if you want to have more stable results, there is only one way: Try to reduce the estimation error by taking some bias. See also Richards shrinking suggestion. Thats also what you did by fiddling around with the Eigenvalues. — vanguard2k, Dec 05 '13 at 16:20
Thank you vanguard2k, appreciate your response. I do not have any index returns. My risk factors are mostly returns of futures contracts with different maturities, belonging to different asset groups and sectors such as rates and commodities. — acmh, Dec 05 '13 at 16:37
Naturally, you have extremely high correlation between contracts that belong to the same product (some particular rate or commodity) and close expirations. — acmh, Dec 05 '13 at 16:43
If you're working with futures contracts, perhaps you should begin by reformulating the problem. For instance, you could start with a parsimonious model of the futures curve (akin to the Nelson Siegel yield curve decomposition) and do that for each period. This way you could reproduce any futures curve with just a few variables. So you'd then only construct the covariance matrix using the time series of the factors from all the futures curves. — John, Dec 05 '13 at 16:56
Ok, now we are getting close... :-)
I couldn't agree more with what @John said. Keep in mind that parametrizing the curves and modelling the parameters will effectively result in a dimension reduction. If you wanted to use your cholesky decomposition to simulate random paths for example, one realization gives you all the changed curves. Using those, you can reprice ALL your thousands of futures contracts (from your curves) and thus get the price changes. There can be other issues though: You will see them when you get there ;-) — vanguard2k, Dec 06 '13 at 07:16
Any suggestions on particular parametrizations? Can we scale those parameters (or their returns) assuming that they are our risk factors, instead of the actual prices/returns? — acmh, Dec 06 '13 at 15:34

score 2 · Answer 1 · edited Apr 13 '17 at 12:46

2

Your issue demonstrated in R with interesting solution Equity Risk Model Using PCA. Another useful link in Matlab by Nick Higham himself NCM implementation by Nick Higham written for Matlab. Another good discussion on shrinkage and other aspects of Correlation Adjustment

edited Apr 13 '17 at 12:46

Community

1

answered May 11 '14 at 01:13

user12348

1,688
11
21

score 2 · Answer 2 · 2015-10-08T00:15:04.157

Let $t$ be the number of days (time periods), and let $p$ be the number of assets. You have $t=1000$ and $p=10000$. For any given dataset, it is assumed that the sample covariance matrix $\mathbf{C}$ accurately represents the population covariance matrix $\boldsymbol{\Sigma}$, however, as $p \rightarrow t$ or if $p > t$ (as in your case), the eigenvalues become unreliable and can also take on a value of zero, resulting in lack of positive definiteness. With high-dimensional datasets becoming more popular, there is greater potential for the number of dimensions to approach the sample size ($p \rightarrow t$), leading to biased eigenvalues of $\mathbf{C}$ and $\mathbf{R}$. Certainly, there will be $p-t$ zero eigenvalues whenever $p>t$ and one zero eigenvalue whenever $p=t$.

You can use singular value decomposition (SVD), which will extract the singular values (eigenvalues) along with the remaining singular values. If $\mathbf{X}$ is your return matrix ($t$ rows, $p$ columns) then use the R syntax below to look at the eigenvalues ("eigvals") from eigendecomposition versus the singular values ("s") from SVD:

R=cor(X);

p <- ncol(X);

t <- nrow(X);

lambdae <- eigen(R);

eigvals <- as.vector(lambdae$values);

E<-as.matrix(lambdae$vectors);

s<-svd(R)

s$d

Last, if you are going to do anything with your data, you might perform dimension reduction by using the eigenvectors to represent your data for dimensions that have non-zero eigenvalues, as they are uncorrelated. You could also use PCA after extracting the singular-values, and ignore the zero eigenvalues. The loadings with the principal components will represent correlation between the original 10000 assets and the reduced orthogonal (non-correlated) dimensions.

score 1 · Answer 3 · answered May 09 '14 at 11:36

From LEP's answer:

there will be p−t zero eigenvalues whenever p>t and one zero eigenvalue whenever p=t.

This is the main reason, your true covariance matrix will have p-t eigenvalues exactly 0. With computer arithmetic you'll have lots of eigenvalues around machine precision, usually about 10^-15. So there should not only be around 5K-6K zero eigenvalues, but about 9K.

You also might want to have a look at Computing the nearest correlation matrix—a problem from finance by Nicholas J. Higham.

score 1 · Answer 4 · answered Dec 05 '13 at 08:27

I think that your problem can be solves by using another estimator for your covariance matrix. A so called shrinkage estimator leads to covariance matrix that is non-singular. Then a Cholesky decomposition should work (maybe there is even a short-cut in the shrinkage world, I will check alter on).

The R package corpcor contains functions to perform shrinkage estimation. More information can be found on the webpage of the developer.

How to treat large (5K-10K) non-positive-definite (particularly near-singular) covariance matrices for Cholesky decomposition?

4 Answers4