Differences between calculating the relative change and taking the natural log to represent relative change in Stata

Question

In a (panel) regression with income as the dependent variable, I would like to estimate the effect of a treatment on the relative change in income. I found two mathematically equivalent ways to do this. Either

by calculating relative change = post-treatment income - pre-treatment income / pre-treatment income and then regressing it on treatment,
or by taking the natural logarithm of income, i.e. lninc = ln(income), then regressing it on treatment and, finally, calculating exp(Beta_T)-1

However, the results are not the same! Here a stylized example replicating the problem:

    clear
set seed 111

set obs 10000

gen id = _n

expand 2 // two observations per individual

bysort id: gen t = _n // time

bysort id (t): gen T = (_n==2) // treatment

gen inc = rnormal(10+50000*T,1) // dependent variable

assert inc &gt; 0 // all values &gt; 0

bysort id: gen relinc = ((inc[_n] - inc[_n-1])/inc[_n-1]) // relative change
replace relinc = 0 if t==1

gen lninc = ln(inc) 

bysort id: gen lndiff = exp(lninc[2] - lninc[1])-1 
sum lndiff relinc if relinc != 0 // the relative changes using these two approaches are in fact the same

xtset id t
qui xtreg relinc T, fe
margins, dydx(T) // 5061

qui xtreg lninc T, fe
margins, expression(exp(_b[T])-1) // 5035

On real data, the differences can be quite large and sometimes even the sign differs.

How come Stata comes to different conclusions here?

Since you are using different numbers to represent your data, you ought to be deeply surprised if Stata did give the same results: that would be grounds to suspect an error. What kinds of differences, then, do you want to draw our attention to? — whuber, Aug 13 '20 at 19:34
Could you be more specific? Different numbers of observations? Both regressions run on the same sample. Different numbers as in two different ways to calculate the relative change? As I show in the code snippet, the approaches are mathematically equivalent. — Ben, Aug 13 '20 at 20:11
On the contrary, they are not mathematically equivalent: the relative change is not the same as the logarithm, even though they will approximately agree for small changes. And exponentiating the parameter estimate simply is incorrect. — whuber, Aug 13 '20 at 20:14

dimitriy · Accepted Answer · 2020-08-14T02:13:13.740

The log difference is an approximation that works for small changes and quickly degrades, as @whuber already pointed out in the comments. Your change is enormous, so it's no surprise. If you have a smaller change, things look much better, as I show below.

He is also correct on the exponentiation. You can read this post by David Giles for details while I blushingly edit some old answers. I have implemented a less biased solution using nlcom. It assumes that once you log the outcome, the errors become normal.

I also tweaked your code in a couple places to use time-series operators, since this is so much better than using relative position.

. clear
. set seed 111
. set obs 10000
number of observations (_N) was 0, now 10,000
. gen id = _n
. expand 2 // two observations per individual
(10,000 observations created)
. bysort id: gen t = _n // time
. bysort id (t): gen T = (_n==2) // treatment
. gen inc = rnormal(10+.5*T,1) // dependent variable
. assert inc > 0 // all values > 0
. xtset id T
       panel variable:  id (strongly balanced)
        time variable:  T, 0 to 1
                delta:  1 unit
. gen relinc = D.inc/L.inc // relative change
(10,000 missing values generated)
. replace relinc = 0 if t==1
(10,000 real changes made)
. gen lninc = ln(inc)
. bysort id: gen lndiff = exp(D.lninc)-1 
(10,000 missing values generated)
. sum lndiff relinc if relinc != 0 // the relative changes using these two approaches are in fact the same
Variable |        Obs        Mean    Std. Dev.       Min        Max

-------------+---------------------------------------------------------
      lndiff |     10,000    .0631367    .1494297  -.4465929   .9864048
      relinc |     10,000    .0631367    .1494297  -.4465929    .986405
. qui xtreg relinc T, fe
. margins, dydx(T) // 5061
Average marginal effects                        Number of obs     =     20,000
Model VCE    : Conventional
Expression   : Linear prediction, predict()
dy/dx w.r.t. : T

         |            Delta-method
         |      dy/dx   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------
           T |   .0631367   .0014943    42.25   0.000     .0602079    .0660655

. xtreg lninc T, fe
Fixed-effects (within) regression               Number of obs     =     20,000
Group variable: id                              Number of groups  =     10,000
R-sq:                                           Obs per group:
     within  = 0.1196                                         min =          2
     between =      .                                         avg =        2.0
     overall = 0.0634                                         max =          2
                                            F(1,9999)         =    1357.76

corr(u_i, Xb)  = 0.0000                         Prob > F          =     0.0000

   lninc |      Coef.   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------
           T |   .0514681   .0013968    36.85   0.000     .0487301    .0542061
       _cons |   2.295573   .0009877  2324.23   0.000     2.293637    2.297509
-------------+----------------------------------------------------------------
     sigma_u |  .07009358
     sigma_e |  .09876703
         rho |  .33495349   (fraction of variance due to u_i)

F test that all u_i=0: F(9999, 9999) = 1.01                  Prob > F = 0.3579
. nlcom (e_assuming_normal_errors:exp(_b[T] - 0.5*_se[T]^2)-1)
e_assuming~s:  exp(_b[T] - 0.5*_se[T]^2)-1

               lninc |      Coef.   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]

-------------------------+----------------------------------------------------------------
e_assuming_normal_errors |   .0528146   .0014705    35.91   0.000     .0499323    .0556968

. xtreg inc T, fe
Fixed-effects (within) regression               Number of obs     =     20,000
Group variable: id                              Number of groups  =     10,000
R-sq:                                           Obs per group:
     within  = 0.1209                                         min =          2
     between =      .                                         avg =        2.0
     overall = 0.0641                                         max =          2
                                            F(1,9999)         =    1375.61

corr(u_i, Xb)  = 0.0000                         Prob > F          =     0.0000

     inc |      Coef.   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------
           T |   .5231742   .0141059    37.09   0.000     .4955239    .5508245
       _cons |   9.980207   .0099743  1000.59   0.000     9.960655    9.999759
-------------+----------------------------------------------------------------
     sigma_u |  .70835751
     sigma_e |  .99743422
         rho |  .33526336   (fraction of variance due to u_i)

F test that all u_i=0: F(9999, 9999) = 1.01                  Prob > F = 0.3323
. margins, eydx(T)
Average marginal effects                        Number of obs     =     20,000
Model VCE    : Conventional
Expression   : Linear prediction, predict()
ey/dx w.r.t. : T

         |            Delta-method
         |      ey/dx   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------
           T |   .0511156   .0013804    37.03   0.000       .04841    .0538212

I also added a third way to calculate an elasticity.

Finally, you may want to review some questions on re-transformation bias. This is something that comes up eventually with logged outcome. I don't want you to have to learn this stuff on the street the hard way.

Differences between calculating the relative change and taking the natural log to represent relative change in Stata

1 Answers1