2

I have been reading about Quantile Regression and the Quantile Loss function, but I have to admit I am a bit lost as how to practically implement it. I would like to use it to calculate the prediction errors of a Machine Learning algorithm (in my case Random Forest). I would like to do it manually without relying on packages (I am using R) so that I am able to understand (hopefully) exactly what's going on.

I read that the Quantile Loss Function has this formulation (Meinshausen, 2006):

$$ \begin{equation} l_{\alpha}(y, q) = \begin{cases} \alpha\vert y - q\vert, & y \gt q \\ (1 - \alpha)\vert y - q\vert, & y \le q \end{cases} \end{equation} \tag{1}\label{1} $$

, where $y$ is a given observation, $q$ is a given prediction, and $\alpha$ is a quantile value.

This way I know how to calculate the loss values $l$ given a chosen $\alpha$.

Reading from this answer, I see one has then to

add up each individual $l$ to get a loss $L$ for the whole model

like so:

$$ L_{\alpha}(y, q) = \sum_{i=1}^n l_{\alpha}(y_i, q_i) \tag{2}\label{2} $$

I am pretty lost as to how to use this information for my implementation.

Should I run the ML algorithm to calculate the predictions first ($q$), inject them in \ref{1} and sum as in \ref{2} to obtain $L$ (I would do this twice with two $\alpha$ values, e.g. 0.2 and 0.8)? If so, what should I do with $L$ then?

Meinshausen, Nicolai, and Greg Ridgeway. "Quantile regression forests." Journal of machine learning research 7.6 (2006).

umbe1987
  • 277
  • (1) I hope you plan on evaluating the loss with different predicitions for your two $\alpha$ levels, since a useful 20% quantile prediction should be quite different from a good 80% quantile prediction. (2) To your actual question, you optimize the model (yielding different $q$s) to minimize the quantile loss - just like you would optimize the model to minimize any other loss, like the MSE. How you do that will depend on your model. – Stephan Kolassa Sep 14 '23 at 12:29
  • @StephanKolassa I am actually struggling to understand how to implement this for Random Forest (how to actually define a loss function for it). BTW, your comment helped me understand better what I am supposed to do/looking at. Thank you. – umbe1987 Sep 14 '23 at 14:23
  • I think I might have found a better way. It seems with predict.all = T I am able to get the predictions of all individual trees. This way I would be able to calculate the quantiles myself. I don't know if this would give the same results as performing Quantile Regression, but if it works I might switch to this method, although I was looking for something I could use with other algorithms as well. – umbe1987 Sep 14 '23 at 14:39
  • 1
    That approach will not do what you want. It will give you quantiles of predicted responses, but as long as you use the standard loss function, the predicted responses will be (different estimates of) conditional means. ... – Stephan Kolassa Sep 14 '23 at 15:02
  • 1
    ... As an illustration, simulate: use lots of iid N(0,1) variables as the outcome, and no useful predictors (e.g., possibly use some random ones). Your predictions will all try to estimate the mean of 0, and a (say) 90% quantile of that will be far away from a 90% of an N(0,1) distribution. There really is no way around either changing the loss function or making parametric assumptions. – Stephan Kolassa Sep 14 '23 at 15:02
  • @StephanKolassa thanks for clarifying. I guess I am left with understanding how to use a quantile loss function with random forest then. – umbe1987 Sep 14 '23 at 15:06
  • Have you taken a look at the quantregForest package for R for inspiration? – Stephan Kolassa Sep 14 '23 at 15:13
  • Sure, that is where I started :) I actually wanted to implement something similar myself, or else, find a package of random forest which let me specify a custom loss function (this would may be a better question after all, I got things much clearer now thanks to your comments). BTW I actually don't know which loss function is used by RF... – umbe1987 Sep 14 '23 at 15:16

0 Answers0