Which is the best backtransformation correction method for log(outcome) predictions?

Question

I am using OLS on log(outcome) to predict future outcomes. My question is which method of correction to backtransformed predictions will least bias the modeled effect of change in predictor values.

I know that transforming outcomes is fraught. An answer of “don’t transform” or “don’t use OLS” is often or even usually better. But they don’t answer my question or solve the problem. For my application, both the predicted log(outcomes) and the predicted linear, or backtransformed, outcomes are used downstream in the larger model. Also, part of what I am doing is assessing at both scales whether predictions of withheld data are more accurate if log predictions are backtransformed, or linear predictions are log transformed. I hope you will agree that the question is valid even if transformations in general, and OLS on log(outcomes) in particular, are very imperfect tools.

Here are possible correction approaches. I am interested to consider another still if that is better:

Duan 1983 (https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1983.10478017?journalCode=uasa20#.Y-_SxuzMLdo): Backtransform each residual, average the values, and use that as a multiplier. \begin{equation} k = \frac{1}{n}{\sum}e^{\epsilon} \end{equation} Duan smearing is described in Using Duan Smear factor on a two-part model and Bias correction of logarithmic transformations, among other questions.
Use Duan’s approach but apply it separately to binned groups of outcome values. (https://arxiv.org/pdf/2208.12264.pdf). This approach seems most relevant if the errors from modeling of logged outcomes still are heteroscedastic, and not necessarily useful if the log transformation results in roughly normal errors on a log scale. But Duan asserts that the smearing factor is non-parametric, so it seems modest deviation from normality shouldn’t invalidate either.

Use Miller 1984, https://www.jstor.org/stable/2683247. Multiply each prediction by the exponent of half the standard deviation of the residuals. \begin{equation} k = e^{\frac{1}{2}sd(\epsilon)} \end{equation} As @COOLSerdash points out (How to back-transform a log transformed regression model in R with bias correction), Newman92 (https://setac-onlinelibrary-wiley-com.ezproxy2.library.colostate.edu/doi/pdf/10.1002/etc.5620120618) recommends using Miller if residuals are normally distributed, and Duan if they are not. Mine are close to normal, so that criterion does not yield an obvious preference.
Match first moments. Multiply each prediction by a constant so that the mean of backtransformed predictions of source values equals the actual mean of source values. Is that equivalent to the accepted answer at https://stats.stackexchange.com/questions/361618/how-to-back-transform-a-log-transformed-regression-model-in-r-with-bias-correcti? And why, despite being simple, is it not commonly recommended?

\begin{equation} k = \frac{{\sum}y}{{\sum}e^{\widehat{ln(y)}}} \end{equation}

The outcome I am modeling ranges over 7 orders of magnitude and is almost exponentially distributed. The methods listed above yield constants that differ widely.

For “best” method I have two criteria. The mean of the linear predictions should be unbiased, which is what all the corrections aim to improve. Second, both linear and log forms of predicted outcomes should respond to variations in predictor values as realistically as the sample enables. A constant correction factor multiplicatively affects linear-scale response sensitivity. My math isn't strong enough to state this generically, but here is an example. If a coefficient for a predictor x of log(outcome) is 3, then a one unit increase in predictor x causes a 3-unit additive increase in predicted ln(outcome), and with no correction a tripling of the backtransformed predicted outcome. If the correction factor (k, above) is 2, then the same change in x causes the backtransformed and corrected predicted outcome to increase instead by a factor of 6. I have no independent way to know if 3 or 6 is more accurate.

So I hope the best choice can be determined theoretically, or at least to understand why it can’t. And if you know of a reference I have not found that describes the effect of bias correction method on accuracy of predictor sensitivity, please point.

I often apply "match first moment", but maybe there are indeed better approaches - good question! — Michael M, Feb 18 '23 at 17:17
"I am using OLS on log(outcome) to predict future outcomes" tells you the answer, because it says you are interested in getting good agreement between the log predictions and the log values. No back transformation is needed. If you were really interested in predicting the actual values (as you suggest at "the mean of the linear predictions should be unbiased," if by that you intend the arithmetic mean), you wouldn't be using OLS on the logs for this analysis. — whuber, Feb 18 '23 at 18:02
@whuber, thank you. This goes into a climate model and it simulates fire emissions. Fire emissions affect cloud microphysics (cloud condensation nuclei) in proportion to a log scale. The same emissions affect plant tissue & photosynthetic capacity, and greenhouse gas concentrations, in proportion to linear emissions. So both matter. I realize there are trade-offs - that's a main point of the project. But I want to make the best assessment I can of costs to estimates of one consequence of good fits to the scale of the other. — InColorado, Feb 19 '23 at 17:26

Aksakal · Answer 1 · 2023-02-18T17:30:09.967

3

There are two common ways to backtransform a model $\ln y=X\beta+e$ into a point forecast:

$\hat y=e^{X\hat \beta}$
$\hat y=e^{X\hat \beta+\sigma^2_e/2}$

The latter one is theoretically inspired by lognormal distribution when $e\sim \mathscr N(0,\sigma^2_e)$. The problem is that we don't know the variance and have to estimate it. Therefore, it is not clear whether this form is actually better in practice. Empirical studies show that often the first form is more optimal (see The role of the log transformation in forecasting economic variables).

That is assuming certain kinds of loss functions, such as squared error loss $\mathscr L(\hat y,y)=(\hat y-y)^2$. In practice, apart from mathematical convenience there's no reason to use such loss function. Other loss functions lead to different optimal forecasts.

In my opinion, the best solution is to do away with point forecast, and use instead distributional forecast. That way you don't worry about this stuff.

edited Feb 18 '23 at 17:30

answered Feb 18 '23 at 17:24

Aksakal

61,310

1

added the reference – Aksakal Feb 18 '23 at 17:35
Given the constraints of the larger model within which I am working, I don't have the option of a distributional forecast, though I agree in principal. Why, though, is option 4, moment matching, not best still? And isn't the first option you list biased - the premise of Duan, Miller, Newman etc. – InColorado Feb 18 '23 at 17:35
moment matching doesn't address the question I raised: why should your point forecast to be a mean? why not median? or some other value? – Aksakal Feb 18 '23 at 17:44
The backtransformed version should be a mean on a linear scale just as OLS leads to the untransformed simulation being a mean on a log scale, or at least I believe that is the best unbiased estimator on a linear scale. The outcomes that the linear simulations subsequently predict downstream in the larger model respond at a near-linear scale. – InColorado Feb 18 '23 at 17:49
why should it be linear? e.g. in your case is the cost of error symmetrical? is it quadratic or linear or something else? – Aksakal Feb 18 '23 at 17:50

score 1 · Answer 2 · answered Feb 24 '23 at 15:36

Your original model is a conditional expectation $E[\log y|X] = XB$, and you are now interested in $ \hat y $. Knowing that $\log y|_X \sim \mathcal N (XB,\sigma^2)$, then $$y|_X \sim \text{lognormal}\left(\exp\left(XB+\frac{\sigma^2}{2}\right), (\exp(\sigma^2)-1)\exp(2XB+\sigma^2)\right),$$

which is the second result in Aksakal's answer.

If the assumptions of the model are correct, then this is distributionally correct, although efficient estimation of $B$ and $\sigma^2$ also comes into play.

One important point is that this estimate is not expected to be optimal with regards to $\text{MSE}$, since it was not derived to be so, because this optimality is guaranteed in the $\log$ scale instead.

True, and I believe this summarizes (well) the Miller84 approach. Given that the criterion of true log-normality of residuals is not decisive, my question is whether and why it is better to use Miller, Duan, or, especially, moment-matching. — InColorado, Feb 24 '23 at 19:22

Which is the best backtransformation correction method for log(outcome) predictions?

2 Answers2

Linked