Bias correction of logarithmic transformations

Question

I read that there is a bias when we transform a series with a logarithm and then applying the inverse function, but only in forecasts of the mean values.

I don't understand what it means exactly.

In the book "Introductory Time Series with R", there is a section about this very issue:

The bias in the means arises as a result of applying the inverse transform to a residual series. For example, if the time series are Gaussian white noise $w_{t}$, with mean zero and standard deviation σ, then the distribution of the inverse-transform (the anti-log) of the series is log-normal with mean $e^{σ ^{2}/2}$. This can be verified theoretically, or empirically by simulation as in the code below:

set.seed(1)
sigma <- 1
w <- rnorm(1e+06, sd = sigma)
mean(w)
[1] 4.69e-05

> mean(exp(w))
[1] 1.65
> exp(sigma^2/2)
[1] 1.65

The code above indicates that the mean of the anti-log of the Gaussian white noise and the expected mean from a log-normal distribution are equal. Hence, for a Gaussian white noise residual series, a correction factor of $e^{σ^{2}/2}$ should be applied to the forecasts of means.

In the same section, it says that an adjusted forecast {$x'_{t}$} with an empirical correction factor is:

$$\hat{x}'_{t} = e^{\hat{\log x_{t}}}\sum_{t=1}^{n}\frac{e^z_{t}}{n}$$

where $\hat{\log x_{t}}$ is the predicted series given by the log-regression model.

Does anyone know what this means? As far as I understand, when we apply a logarithm to a series and fit a (linear) model to the resulting values, we are doing the following:

$$\log x_{t} = \alpha t + \beta + z_{t}$$

where $z_{t}$ are the residues. So, I would think that the reverse transformation is given by:

$$x_{t} = e^{\alpha t + \beta} e^{z_{t}}$$

However, what is the forecasting of means and how is it involved in this procedure? I'm guessing it's $\displaystyle \frac{e^z_{t}}{n}$ but I can't see a justification.

UPDATE:

Suppose we use the model described above to predict some values $p_{t}$. Since we used a logarithmic transformation, to revert it, we have to do this for all new $t$:

$$\exp{p_{ŧ}}$$

However, the book says that in order to apply the correction to each new predicted value, we need to do this:

$$p_{\text{t}}^{\text{corrected}} = p_{t} \frac {1}{n}\sum_{t=1}^{n} e^{\hat z_t}$$

Why? I thought that the mean of the forecast is biased but not its individual values.

Closely related: http://stats.stackexchange.com/questions/49595/puzzled-by-derivation-of-time-series-prediction-based-on-its-log/49597#49597. Please check your quotation: I don't believe the "correction factor" has been accurately transcribed (and it doesn't agree with your R code either). — whuber, Sep 10 '13 at 01:27
I think the code snippet I added only reflects the change from a residual series with mean 0 to one with mean $exp(sigma^2/2)$ which is the distribution obtained from such transformation. There is an example that uses the correction factor as mean(exp(resid(AP.lm2))) where AP.lm2 is the model. — r_31415, Sep 10 '13 at 01:48
The correct mean is $\exp(\sigma^2/2)$, not $\exp(2\sigma^2)$, which appears twice in your quotations. — whuber, Sep 10 '13 at 01:49
Oh, you're absolutely right. It was a typo when I copied the quoted text. I will correct it in a moment. — r_31415, Sep 10 '13 at 01:55
I deleted my answer because it seems to be fully covered at @whuber's linked question and answer. — Glen_b, Sep 10 '13 at 02:40
That's ok. On the other hand, why using means in the correction factor in this particular example? In the linked question, there is an explicit reference to the predicted mean $log(x_{t})$ and therefore, it's reasonable to obtain a mean prediction $x_{t}$ but I can't see that here. — r_31415, Sep 10 '13 at 02:48
Re: UPDATE: Robert, almost all "forcasted values" are essentially mathematical expected values, i.e. mean values, even if only empirically and approximately calculated and with bias. The formula you wrote in the UPDATE is the same as the one you wrote initially. As I essentially showed in my answer (but perhaps I didn't stress) the use of the term "correction" is misleading. There is no "correction" here, just the implementation of the empirical counterpart of the full theoretical relationship. A "correction" would emerge if one tried to correct for the bias of this empirical formula. — Alecos Papadopoulos, Sep 11 '13 at 00:50
@AlecosPapadopoulos I didn't know forecasts are expected values. Since this forecast is based on regression, how can I see that a regression is simply averaging values? In any case, assuming that we are dealing with averages, then yes, the formula is exactly the same than before. By the way, in your answer you said that this formula ignores various biases, can you describe a bit what you mean? — r_31415, Sep 11 '13 at 00:58
Regarding regression as expected value, you can look up my answer http://math.stackexchange.com/questions/482910/mean-response-in-linear-regression/482951#482951 — Alecos Papadopoulos, Sep 11 '13 at 01:09
Now I understand what you mean. The use of the predicted values in this way still seems like a subtle point, though, but thanks for all your help! — r_31415, Sep 11 '13 at 02:00

score 5 · Accepted Answer · answered Sep 10 '13 at 03:08

Your variable is defined as $$X_{t} = e^{\alpha t + \beta} e^{z_{t}} $$

Say you have a sample $S_n$ of $n$ observations of past values of the variable and you want to forecast period $n+1$.

Then $$E\Big (X_{n+1}\mid S_n \Big ) = E\Big (e^{\alpha (n+1) + \beta} e^{z_{n+1}}\mid S_n\Big) = E\Big (e^{\alpha (n+1) + \beta}\mid S_n\Big) E\left(e^{z_{n+1}}\right)$$ $$= E\Big (e^{\alpha (n+1) + \beta}\mid S_n\Big)e^{\sigma^2/2}$$

...since $z_t$ is Gaussian white noise.

The "adjusted forecast with an empirical correction factor", uses rather confusing if not incorrect notation, ignores various biases, and approximates the above by $$E\Big (e^{\alpha (n+1) + \beta}\mid S_n\Big) \approx e^{\hat \alpha (n+1) + \hat \beta} = e^{\hat{\log x_{n+1}}} $$ and $$ e^{\sigma^2/2} = E\left(e^{z_{n+1}}\right) \approx \frac {1}{n}\sum_{t=1}^{n} e^{\hat z_t}$$

and so $$ \widehat E\Big (X_{n+1}\mid S_n \Big )= e^{\hat{\log x_{n+1}}}\frac {1}{n}\sum_{t=1}^{n} e^{\hat z_t} $$

Brilliant answer. So we are actually getting a mean of the forecast. I think that's not indicated in the "adjusted forecast". — r_31415, Sep 10 '13 at 03:39
I have one more question. In the book there is a prediction based on the model described above and it looks like this: AP.pred.ts <- exp(ts(predict(AP.lm2, new.dat), st = 1961, fr = 12)). Well, the correction you described in your answer is used like this: empirical.correction.factor <- mean(exp(resid(AP.lm2))) AP.pred.ts <- AP.pred.ts * empirical.correction.factor The correction factor is an average, right? Why should we multiply this by each prediction value? — r_31415, Sep 11 '13 at 00:13

Bias correction of logarithmic transformations

1 Answers1

Linked