0

Using stata (with weighted survey design) I ran the following, where logwage is the log of wage. The log was taken because wage was not normally distributed. There is also information about the workers' demographics such as racial/ethnic, gender, previously held education, and whether or not they participated in a voluntary training (binary variable yes = 1, no = 0).

svy: etregress logwage i.race gender, treat(training = i.education gender) 

Because the dependent variable is log and the treatment effect as well as all the independent variables are NOT in log form, I'm not sure how to interpret the coefficients reported.

--------------------------------------------------------------------------------------------------
                                 |             Linearized
                                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------------+----------------------------------------------------------------
logwage                          |
                            race |
                African American |   .3891554   .0031105    12.20   0.000     .2000000    .8474752
                 Asian American  |   .1487310   .0002843    04.11   0.000     .027113     .8765290
                                 |
                          gender |
                         female  |  -.0230411    .010445    -6.85   0.000    -.115341   -.0107295
                                 |
                  1.training |   .3703371   .0451778    10.61   0.000     .2018037    .4186134


---------------------------------+---------------------------------------------------------------- training | i.education | Highschool | -.0715731 .0490565 1.28 0.098 -.1106579 .1291781 College | .1271380 .0401052 3.95 0.003 .0329516 .2107563 Grad School | .8522143 .0085337 8.99 0.000 .8271381 .9573284 | gender | female | .0127444 .0100058 5.33 0.041 .0100558 .0866312 _cons | -1.260083 .0327235 -26.12 0.000 -1.531405 -1.098524 ---------------------------------+---------------------------------------------------------------- /athrho | .0051552 .031410 0.17 0.827 -.0722533 .0810246 /lnsigma | -1.872551 .0166818 -73.50 0.000 -1.928624 -1.278064 ---------------------------------+---------------------------------------------------------------- rho | .0084120 .0421116 -.0649947 .0888529 sigma | .4000831 .0038170 .1925127 .5067780 lambda | .0012673 .0226365 -.0324029 .016937 --------------------------------------------------------------------------------------------------

Like, what is the interpretation of the gender coefficient for the first and second entry?

Edit: My thinking is the 'female' coefficient logwage component is interpreted in the same as %Δy=100⋅β1⋅Δx. So being female results in -2.30% change in wage. But it is not clear what the 'female' in the 'training' section means. Is it also %Δy=100⋅β1⋅Δx? Or no? And if it is % change (i.e. 1.27% change), then is that for the training or the wage- as in women more likely to have the training?

iPlexipen
  • 211
  • This has come up before. See this, for example. You can use margins of an expression or nlcom to calculate SEs. Also, note that the rationale behind the log(y) transformation is not about the distribution of wage itself, but about the distribution of the errors conditional on x. – dimitriy Jul 28 '20 at 19:48
  • @Dimitriy perhaps we are talking past one another here... but what I'm asking is what is the interpretation of the female coefficient -.0230 in the logwage component and the interpretation of the female coefficient .0127 in the training component? Also, nlcom doesn't seem to work with survey data categorical variables. As for margin, the same problem occurs, are margins values that need exponentiation to be interpreted? Here is the best I am able to find: https://www.stata.com/stata-news/news34-2/spotlight/ – iPlexipen Jul 28 '20 at 20:00
  • 1
    As the link I shared tells you, it means that women earn 100*(exp(-.0230411)-1) = -2.28% less than men according to your model. The first stage probit coefficient is harder to interpret. The fact that it is positive and significant means that women are more likely to seek out training. To translate into something more meaningful (like a change in pr(training)), you will need to calculate the marginal effect somehow. The formula is here. – dimitriy Jul 28 '20 at 20:26

1 Answers1

0

Here's a replicable (but completely non-sensical) example where the outcome is log of lead blood levels and the treatment is diabetes. We will interpret the female coefficient from both equations.

The treatment probit equation implies a 0.7 percentage point increase in probability of having diabetes for women, relative to men (.007 on [0,1] scale is 7/10th of percentage point on [0,100] scale), on average. It also shows a 30.64% decrease in lead for females relative to males (ATE). This is called a semi-elasticity, and some care must be taken since female is a binary variable. We will use finite-differences for both.

We first calculate these estimates using margins and nlcom, which will not work with svy. Then we do it by hand using svy: mean to show that the point estimates agree.

Code is at the very bottom, code with output is below:

. webuse nhanes2f, clear

. svyset psuid [pweight=finalwgt], strata(stratid)

  pweight: finalwgt
      VCE: linearized

Single unit: missing Strata 1: stratid SU 1: psuid FPC 1: <zero>

. svy: etregress loglead i.female i.diabetes, treat(diabetes = weight age height i.female) // coefl (running etregress on estimation sample)

Survey: Linear regression with endogenous treatment Estimator: maximum likelihood

Number of strata = 31 Number of obs = 4,940 Number of PSUs = 62 Population size = 56,316,764 Design df = 31 F( 2, 30) = 575.75 Prob > F = 0.0000


         |             Linearized
         |      Coef.   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- loglead | 1.female | -.365953 .0106445 -34.38 0.000 -.3876626 -.3442434 1.diabetes | .2187191 .0579993 3.77 0.001 .1004288 .3370095 _cons | 2.760332 .0180171 153.21 0.000 2.723586 2.797078 -------------+---------------------------------------------------------------- diabetes | weight | .0120452 .0025572 4.71 0.000 .0068297 .0172606 age | .0227368 .0029366 7.74 0.000 .0167476 .0287259 height | -.0143508 .0051924 -2.76 0.010 -.0249408 -.0037608 1.female | .1143353 .0862421 1.33 0.195 -.0615567 .2902273 _cons | -1.459728 .861842 -1.69 0.100 -3.217466 .2980107 -------------+---------------------------------------------------------------- /athrho | -.3346261 .0729646 -4.59 0.000 -.4834384 -.1858138 /lnsigma | -.973891 .0302057 -32.24 0.000 -1.035496 -.912286 -------------+---------------------------------------------------------------- rho | -.3226714 .0653678 -.448993 -.1837044 sigma | .3776109 .011406 .3550502 .4016051 lambda | -.1218442 .0253314 -.1735079 -.0701805


. display "Percent Change ln(lead) = " 100*( exp(_b[loglead:1.female]) - 1) Percent Change ln(lead) = -30.646461

. . /* (1) using commands that don't work with svy */ . margins, predict(ptrt) at(female=(0 1))

Predictive margins

Number of strata = 31 Number of obs = 4,940 Number of PSUs = 62 Population size = 56,316,764 Model VCE : Linearized Design df = 31

Expression : Pr(diabetes), predict(ptrt)

1._at : female = 0

2._at : female = 1


         |            Delta-method
         |     Margin   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- _at | 1 | .0293652 .0037914 7.75 0.000 .0216325 .0370979 2 | .0371157 .0041162 9.02 0.000 .0287207 .0455106


. margins r.female, predict(ptrt)

Contrasts of predictive margins

Number of strata = 31 Number of obs = 4,940 Number of PSUs = 62 Population size = 56,316,764 Model VCE : Linearized Design df = 31

Expression : Pr(diabetes), predict(ptrt)


         |         df           F        P&gt;F

-------------+---------------------------------- female | 1 1.72 0.1989 Design | 31


Note: F statistics are adjusted for the survey design.


         |            Delta-method
         |   Contrast   Std. Err.     [95% Conf. Interval]

-------------+------------------------------------------------ female | (1 vs 0) | .0077504 .0059037 -.0042902 .0197911


. nlcom pct_eff:(100*(exp(_b[loglead:1.female])-1))

 pct_eff:  (100*(exp(_b[loglead:1.female])-1))


         |      Coef.   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- pct_eff | -30.64646 .7382346 -41.51 0.000 -32.09337 -29.19955


. . /* (2) Both AMEs by hand using predict */ . replace female = 1 (4,909 real changes made)

. predict d1, ptrt

. predict lny1, xb (2 missing values generated)

. replace female = 0 (10,337 real changes made)

. predict d0, ptrt

. predict lny0, xb (2 missing values generated)

. gen double diff_pr = d1-d0

. gen double diff_lny = lny1 - lny0 (2 missing values generated)

. . svy: mean d1 d0 diff_pr diff_lny (running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31 Number of obs = 10,335 Number of PSUs = 62 Population size = 116,997,257 Design df = 31


         |             Linearized
         |       Mean   Std. Err.     [95% Conf. Interval]

-------------+------------------------------------------------ d1 | .0376153 .0005965 .0363988 .0388317 d0 | .0297683 .0004914 .0287661 .0307705 diff_pr | .007847 .0001054 .007632 .0080619 diff_lny | -.365953 . . .


. display "Average ln(lead) difference as a semi-elasticity = " (100*(exp(-.365953)-1)) Average ln(lead) difference as a semi-elasticity = -30.64646


Code:

cls
webuse nhanes2f, clear
svyset psuid [pweight=finalwgt], strata(stratid)
svy: etregress loglead i.female i.diabetes, treat(diabetes = weight age height i.female) // coefl
display "Percent Change ln(lead) = " 100*( exp(_b[loglead:1.female]) - 1)

/* (1) using commands that don't work with svy / margins, predict(ptrt) at(female=(0 1)) margins r.female, predict(ptrt) nlcom pct_eff:(100(exp(_b[loglead:1.female])-1))

/* (2) Both AMEs by hand using predict */ replace female = 1 predict d1, ptrt predict lny1, xb replace female = 0 predict d0, ptrt predict lny0, xb gen double diff_pr = d1-d0 gen double diff_lny = lny1 - lny0

svy: mean d1 d0 diff_pr diff_lny display "Average ln(lead) difference as a semi-elasticity = " (100*(exp(-.365953)-1))

dimitriy
  • 35,430