1

I'm trying to perform quantile regression in R on a multivariate data set, where I want to visualize the 10% and 90% bounds. I also intend to calculate PICP - Prediction interval coverage probability according to

enter image description here

Where

enter image description here

LPIi and UPIi are respectively lower and upper bounds of the prediction interval constructed for ith sample. ti is the ith target value. I make predictions with another model - in the example I use a linear model, but my intention is also to include other models such as a GP.

Currently, I'm a bit stuck on the visualization part, as I'm not sure how to visualize the lines for the upper and lower bound, when I have multiple covariates.

I've tried to create a reproducible example below which highlights what I want to do. I'm using the quantreg package:

# Create a covariate
set.seed(123456)
x1 <- rnorm(1000, mean = 75, sd = 5)

Create a dummy encoding for the hours of the day and remove first column to avoid dummy trap.

dummies <- sample(1:24, 1000, replace = TRUE) dummies <- model.matrix(~factor(dummies))[,-1] colnames(dummies) <- paste0("hour", seq(2:24))

#Create a response variable set.seed(123456) y <- rnorm(1000, mean = 50 + x1 + dummies %% (10 1:23), sd = 10)

Combine into a data frame

df <- cbind.data.frame(y, x1, dummies)

Split into some train and test data

set.seed(123456) id <- sample(1000, 900) # 90-10 ratio train <- df[id, ] test <- df[-id,]

Load Quantile Regression package

library(quantreg)

Fit Quantile model for tau = 0.1 & tau = 0.9

QR_fit_1 <- rq(y ~., data = train, tau = 0.1) QR_fit_9 <- rq(y ~., data = train, tau = 0.9)

Results in: Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix

Add small jitter according to https://stats.stackexchange.com/questions/78022/cause-of-singularity-in-matrix-for-quantile-regression

jittered_y <- jitter(df[, 1]) train_jit <- train test_jit <- test train_jit$y <- jittered_y[id] test_jit$y <- jittered_y[-id]

Fit Quantile models again - still issues

QR_fit_1 <- rq(y ~., data = train_jit, tau = 0.1) QR_fit_9 <- rq(y ~., data = test_jit, tau = 0.9)

From the documentation, I try the suggested approach of changing method on page 5

https://cran.r-project.org/web/packages/quantreg/vignettes/rq.pdf

QR_fit_1 <- rq(y ~., data = train, tau = 0.1, method ="fn") QR_fit_9 <- rq(y ~., data = train, tau = 0.9, method ="fn")

Visualize y with the upper & lower quantiles

plot(c(train$y), xlim = c(0, 1000)) points(QR_fit_1$fitted.values, col=2) points(QR_fit_9$fitted.values, col=4) abline(QR_fit_1, col=2) abline(QR_fit_9, col=4) axis(1, xlim = c(0, 1100))

Make predictions with another model, i.e linear model

fit_mod <- lm(y ~., data = train) y_pred <- predict(fit_mod, test)

Add predicted data points to graph

test_fit_1 <- predict(QR_fit_1, test) test_fit_9 <- predict(QR_fit_9, test) points(c(901:1000), y_pred, col = 3)

Compute PICP

???

Any suggestions on how I could do this would be greatly appreciated. As the ??? indicates, I'm a bit unsure of how to calculate the PICP, as I'm not sure how to obtain the lower and upper bounds.

OLGJ
  • 317
  • When you include covariates, you make the problem inherently multidimensional and not conducive to easy visualization. If you just wanted to do a plot of the mean instead of the quantiles, what would you do? (“I don’t know,” is an acceptable answer.) – Dave Apr 18 '23 at 03:01
  • I don't know =) – OLGJ Apr 18 '23 at 07:32

0 Answers0