I've done some reading here in the past, and my basic assumption is that for a generalized additive model (GAM) or a quantile regression (QR), the following is generally true:
- For a Gaussian distributed error term, the adjusted $R^2$ in GAMs functions similarly to it's analogue in linear regression, and thus provides an approximation of the explained variance in the outcome attributed to the predictors in the model.
- For the generalized case that extends to families like binomial, Poisson, etc. in GAMs, from what I gather, it is better to report the deviance explained, as the adjusted $R^2$ doesn't typically make sense in this setting.
- For QR, there seem to be alternative pseudo $R^2$ values, $R^1$, which approximate the adjusted $R^2$ given a specific fitted quantile $\tau$. However, it appears that the creator of the
quantregpackage Roger Koenker does not have a high opinion of either $R^1$ or $R^2$, as noted here.
For a fitted quantile generalized additive model (QGAM), this slightly complicates things, as this estimates GAMs with a quantile-based approach (see here for details). The model output generally gives both adjusted $R^2$ and deviance explained, but all of the articles I have seen from the package creators don't seem to provide any mention of if these are accurate for quantile fits or how to interpret them in the first place. For now, I plan on providing both estimates from the qgam package and leaving others to interpret them if they are meaningful, but it would be helpful to know what they actually provide, if anything at all, and which to use. My thinking is that because the fitting is non-parametric, perhaps deviance explained is better, but I'm not sure mathematically how that works. It also appears that for some models I have fit in the past, the adjusted $R^2$ values can be tragically low, which I fear will unnecessarily blind reviewers from interpreting my models meaningfully.
Edit
In the comments it was suggested that perhaps $D^2$ may be useful. However, there are two issues present with that. First, I don't know if this captures nonlinear data well. Second, I'm not sure how you obtain this from a qgam fit. As an example of a fit using multiple quantiles, I provide the following example below:
#### Load Libraries ####
library(qgam)
library(MASS)
Fit Multiple Quantile Model
fit <- mqgam(
accel ~ s(times),
data = mcycle,
qu = c(.25,.5,.75)
)
Plot Pinball Loss
check.learnFast(fit$calibr)
Which shows the pinball loss for each quantile of the QGAM:
But I can't determine how that is calculated or how to obtain a $D^2$ statistic from it.
Edit 2
The article from the authors of the package discuss the loss function in some sections, but it doesn't appear exactly the same based off the fact that they used a Bayesian credible interval approach to estimating the splines that involves a "smoothed" version. See quote below:
In particular, QGAMs are based on the pinball loss function (Koenker and Bassett 1978), rather than on a probabilistic model of the response distribution, and the absence of a likelihood function impedes direct application of the Bayes’ rule when updating the prior distribution on the regression coefficients given the observed responses. While this problem can be overcome by adopting the coherent Bayesian belief updating framework of Bissiri, Holmes, and Walker (2016), which effectively leads to the application of Bayes’ rule using a loss-based pseudo-likelihood, naïvely plugging such a pseudo-likelihood into standard Bayesian fitting methods can lead to inaccurate quantile estimates and inadequate coverage of the corresponding credible intervals as discussed, for instance, in Yang, Wang, and He (2016) and Sriram (2015). To avoid such issues, qgam implements the calibrated Bayesian methods of Fasiolo et al. (2021b), which explicitly aim at selecting the “learning rate” tuning parameter of the loss so as to achieve near-nominal frequentist coverage of the quantile credible intervals. Furthermore, qgam bases quantile regression on a smoothed version of the pinball loss, which enables the adoption of fast maximum a posteriori (MAP) and empirical Bayes methods to estimate the regression coefficients and select their prior variance hyper-parameters, respectively. The smoothness of the new loss is determined by minimizing the asymptotic mean squared error (MSE) of the estimated regression coefficients, approximated using a location-scale GAM.
In the paper they reference, they go into more mathematical detail of the smoothed loss that is beyond my understanding of mathematical statistics (p.1404-1406), but they summarize a bit here saying:
The results in this section proved that the pinball loss $(\lambda = 0)$ is not optimal, and that we should use the ELF loss with smoothness determined by $\tilde{h^*}$. This substitution of the smoothed loss greatly simplifies computation as it permits the use of smooth optimization methods for estimation.
So it appears at least at surface value to differ in some way, but how that effects something like $R^2$, $R^1$, or $D^2$ is beyond my understanding.
Edit 3
I find Dave's answer to be very useful. For my own sanity, I wanted to make sure that my application of what he said makes sense, so I provide some code here which I believe takes what he said and applies to to the QGAM I added earlier (though here I just use a single quantile to make things easier).
#### Load Libraries ####
library(qgam)
library(MASS)
library(quantreg)
Fit Single Quantile QGAM
fit <- qgam(
accel ~ s(times),
data = mcycle,
qu = .25
)
Fit Quantile-Only QREG
fit.null <- rq(accel ~ 1,
mcycle,
tau=.25)
Pinball Loss Function
pinball_loss <- function(y, yhat, tau = 0.5){
if (length(y) != length(yhat)){
print("Unequal lengths")
}
N <- length(y)
ell <- rep(NA, N)
for (i in 1:N){
if (y[i] - yhat[i] >= 0){
ell[i] <- tau * abs(y[i] - yhat[i])
}
if (y[i] - yhat[i] < 0){
ell[i] <- (1 - tau) * abs(y[i] - yhat[i])
}
}
return(sum(ell))
}
Save Y and Y Hats
y <- mcycle$accel
yhat.null <- fitted(fit.null)
yhat.fit <- fitted(fit)
Calculate Pinball Loss
mod.interest <- pinball_loss(y,yhat.fit,.25)
mod.null <- pinball_loss(y,yhat.null,.25)
mod.interest # much lower loss (more predictive power)
mod.null # high loss (less predictive power)
Get D2 Value
d2 <- 1-(mod.interest/mod.null)
d2
It appears this gets exactly what I want, but I want to make sure that this is correct before accepting any answers.
As a side note, it appears you cannot estimate an intercept-only model in qgam, which probably doesn't matter here, but found that an interesting difference between quantreg and qgam.

$$ R^2=1-\dfrac{\text{ Pinball loss of model of interest }}{\text{ Pinball loss of baseline model }} $$
My main concerns beyond what I stated are somewhat specific to what you said: in a multiple predictor case, I am not sure how to calculate the $D^2$ statistic, particularly while accounting for things like spline complexity (I'm not sure if that matters or not, just my thoughts).
– Shawn Hemelstrand Oct 24 '23 at 03:28quantregorqrnncan calculate pinball loss for your model (and an intercept-only model to give the $D^2$ denominator), and there’s always the option to write your own pinball loss function. // I plan to summarize these comments in an answer I’ll post tomorrow. – Dave Oct 25 '23 at 02:42pinLosswithin theqgampackage, but its not immediately clear how to use it with respect to models estimated by QGAMs. See p.18-19 of the package doc here: https://cran.r-project.org/web/packages/qgam/qgam.pdf – Shawn Hemelstrand Oct 25 '23 at 05:03