For learning purposes I have been using a subset of the MNIST data--just the training set and just the digits 3 and 8. I want to try linear regression for doing this classification. (Yes I realise that logistic regression is more appropriate). In R syntax (this is not an R specific question), I do:
b<-qr.solve(cbind(1,x),y)
qr.solve() solves systems of equations via the QR decomposition
x is the data matrix, one row per handwritten digit, each col is one pixel (28x28)
the cbind() adds a column of 1s onto x so that I can fit an intercept
y is the true identity of each digit, 0=3, 1=8
b is the vector of fitted coefficients
The problem is that qr.solve() fails, giving the error that x is singular. I guess that this means the columns of x are linearly dependent. This can be caused multiple ways. I have removed duplicate columns, zero columns, and columns that contain a single value. After doing this, qr.solve() still chokes and says x is singular.
How can I determine what columns are causing trouble? How to modify x so that linear dependence is gone? (I realise that lasso and ridge regression would be good options here to regularise x [down-weight linearly dependent cols].) One thing that works is to add noise to x, but that seems fishy to me.
qr(x[,-i])$rankfor i=1 to ncols. Find the rank of x when col 1 is missing, when col 2 is missing, etc. If the rank is not affected by deleting that column, then that column is not linearly independent. Unfortunately this will be incredibly slow for a machine learning set-up like mine where x has 784 cols and 8414 rows. (I tried it) – Bill Simpson Jun 21 '17 at 14:28iyou check whetherx[, 1:i]is singular, proceeding fromi=1onward. Whenx[, 1:i]first becomes singular, you may drop any single one of the columns indexed between1andi. AlthoughRdoesn't seem to have a built-in procedure to sequentially develop a QR decomposition, efficient methods to do so exist: see https://stats.stackexchange.com/questions/6920/ for a discussion of efficient addition of rows; there are comparable algorithms to add columns. I processed a dataset your size with brute force in 100 seconds. – whuber Jun 21 '17 at 16:10