8

I am performing quantile regressions in R using the package quantreg. My dataset includes 12,328 observations ranging from 0.12 to 330. The timepoints for my data are not exactly continuous; all data fall into one of a few dozen bins ranging from 73 to 397.

When I performed a linear regression on this data using the lm() function, I was able to do this with polynomials up to 4:

lm(Y~poly(X,3,raw=TRUE),data=mydata)

However, with the package quantreg and the rq() command, I cannot use any polynomials. A simple regression works just fine:

rq(Y~X,data=mydata,tau=.15)

But as soon as I get into polynomials, no dice. When I enter this:

rq(Y~poly(X,2,raw=TRUE),data=mydata,tau=.15)

I get the following error message:

Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix

I've read up on singular matrices, and I think there might be two reasons for this: (1) I only have one variable on each axis, or (2) my data are binned/the Y variable isn't truly continuous.

Can anyone tell me why I'm getting this error?

PS - This is how the graph looks:

enter image description here

Charcha
  • 125
  • Did you get any answer for this ? Seems like this is due to large no. of repeated values [https://stat.ethz.ch/pipermail/r-help//2013-April/351935.html] – Mohit Verma Sep 14 '15 at 17:16

2 Answers2

6

I believe the reason it is coming up as singular is your second reason, that the data are binned. Duplicating observations (for a single x value, multiple responses) increases chances of singularity.

I had the same error message as you with a similarly structured dataset. I have multiple observations for each x value, some of which were identical. I got around it by 'jittering' the data, adding a very small amount of random noise to the response values using rnorm(). This meant that though there were multiple observations for each x value, there were no identical repeats and the rq() function works. As long as the noise you add is small, it won't affect the coefficient and SE estimates from rq noticeably.

Justina Pinch
  • 169
  • 1
  • 3
2

An alternative to rnorm() proposed by Jack Ballard is using jitter() from the base package.

florian
  • 561
  • 1
  • 5
  • 13