0

Regarding data, there are three binary variables: trt(0,1), fail(0,1), and female(0,1).

For sub-group analysis (male vs. female), I am running typical 2x2 treatment vs. failure tables separately for males and females. These tables yield gender-specific failure ORs, which are

Male: OR = 0.75(.46,1.24)
      p-value 0.26

Female: OR = 0.38(.16,.92) p-value 0.03

So, one of the sub-groups was identified to show a significant treatment effect.

If the male and female univariate models and interaction model are run using a logit model, you can see that the interaction term is not signif (P=0.184), but if the constant is left out the interaction term is significant - however, that interaction is biased by the constant term - obviously. Several collaborators like the last logit model without the constant term, since it yields a significant interaction term -- like the female sub-group analysis. However, I believe that it's an erroneous assumption to assume that the interaction p-value would be significant as long as one of the sub-groups has a significant treatment effect? I actually favor the interaction model with the constant term, since the slope difference between the male and female treatment effects (univariate models) can be discerned.

. logit fail trt if male==1

    fail |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- trt | -.285727 .2535624 -1.13 0.260 -.7827001 .211246 _cons | .4192584 .1858278 2.26 0.024 .0550427 .7834742


. logit fail trt if female==1


    fail |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- trt | -.9751312 .4522192 -2.16 0.031 -1.861464 -.088798 _cons | .7339692 .3511885 2.09 0.037 .0456524 1.422286


. logit fail trt female trtfem


    fail |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- trt | -.285727 .2535624 -1.13 0.260 -.7827001 .211246(Male trt) female | .3147107 .3973227 0.79 0.428 -.4640274 1.093449(Female const - Male const) trtfem | -.6894042 .5184554 -1.33 0.184 -1.705558 .3267498(Female trt - Male trt)-->Interaction _cons | .4192584 .1858278 2.26 0.024 .0550427 .7834742(Male const)


. logit fail trt female trtfem, nocon


    fail |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- trt | .1335314 .1725164 0.77 0.439 -.2045945 .4716573(Const + trt, 0.419 - 0.286) female | .7339692 .3511885 2.09 0.037 .0456524 1.422286(Const + Female trt, 0.419 + 0.315) trtfem | -1.108663 .4840083 -2.29 0.022 -2.057302 -.1600237(Interaction - const, -0.689 - 0.419)


  • 2
    Just because the effect for males is nonsignificant and the effect for females is doesn't mean that the treatment has a larger effect for females than for males. To draw that conclusion, one must accept the null that the true effect for males = 0 (odds ratio of 1). The interaction is testing whether the difference between the log of .75 differs from the log of .38. It does not. You cannot conclude that there is a differential treatment effect. Removing the intercept does not make any sense. It needs to be in the model. – dbwilson Jul 14 '20 at 20:36
  • 1
    Interactions in non-linear models can be tricky. See this post for the continuous variable example. Is there any reason not to use robust regression here since the model is saturated? – dimitriy Jul 14 '20 at 22:24
  • I don't really follow the logic behind why the constant leads to bias. – dimitriy Jul 14 '20 at 22:25
  • @Dimitry - the Male-only model intercept value is subtracted from the true interaction value when the intercept is dropped from the interaction model. That is, it's not a true delta of slopes between males and females -- it's biased. But your point is well taken about non-linearity. –  Jul 14 '20 at 22:41
  • 2
    Agreed. Another way to think about it is that with no intercept, the intercept is forced to zero. In this model, that translates into forcing the logit to zero for the no treatment and male condition (value of zero for the independent variables). A logit of zero is a failure rate of .5. Thus, without the intercept, you are fixing the failure rate for the male/no treatment group at .5 no matter what its observed value is. – dbwilson Jul 14 '20 at 23:38

1 Answers1

0

I find it really hard to think about coefficients on the index function scale and translate that to things that I ultimately care about, like probabilities or ORs. This is especially true for interactions. So I would do something like this, which leads to identical conclusions any way you do it (as long as the model stays fully saturated). Perhaps you will find it useful. I've omitted explanations since everything is so similar to the linear case, and I am just doing the equivalent comparison for the nonlinear logit models. I start with OLS, then combined logit, and then subsample logits.

. #delimit;
delimiter now ;
. sysuse auto, clear;
(1978 Automobile Data)

. gen high_mpg = mpg>22;

. gen high_price = price>6000;

. reg foreign i.high_mpg##i.high_price, robust;

Linear regression Number of obs = 74 F(3, 70) = 8.78 Prob > F = 0.0001 R-squared = 0.2495 Root MSE = .40711


                |               Robust
        foreign |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

--------------------+---------------------------------------------------------------- 1.high_mpg | .4032258 .1272598 3.17 0.002 .1494142 .6570374 1.high_price | .1385199 .1190367 1.16 0.249 -.0988913 .3759312 | high_mpg#high_price | 1 1 | .1948134 .2277168 0.86 0.395 -.2593534 .6489802 | _cons | .0967742 .0545964 1.77 0.081 -.0121149 .2056633


. margins high_mpg#high_price;

Adjusted predictions Number of obs = 74 Model VCE : Robust

Expression : Linear prediction, predict()


                |            Delta-method
                |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]

--------------------+---------------------------------------------------------------- high_mpg#high_price | 0 0 | .0967742 .0545964 1.77 0.081 -.0121149 .2056633 0 1 | .2352941 .1057779 2.22 0.029 .0243267 .4462616 1 0 | .5 .1149534 4.35 0.000 .2707327 .7292673 1 1 | .8333333 .1564318 5.33 0.000 .52134 1.145327


. margins high_price, dydx(high_mpg);

Conditional marginal effects Number of obs = 74 Model VCE : Robust

Expression : Linear prediction, predict() dy/dx w.r.t. : 1.high_mpg


         |            Delta-method
         |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- 0.high_mpg | (base outcome) -------------+---------------------------------------------------------------- 1.high_mpg | high_price | 0 | .4032258 .1272598 3.17 0.002 .1494142 .6570374 1 | .5980392 .1888382 3.17 0.002 .2214133 .9746652


Note: dy/dx for factor levels is the discrete change from the base level.

. margins r.high_price, dydx(high_mpg);

Contrasts of conditional marginal effects Number of obs = 74 Model VCE : Robust

Expression : Linear prediction, predict() dy/dx w.r.t. : 1.high_mpg


         |         df           F        P>F

-------------+---------------------------------- 0b.high_mpg | high_price | (not testable) -------------+---------------------------------- 1.high_mpg | high_price | 1 0.73 0.3952 | Denominator | 70



         |   Contrast Delta-method
         |      dy/dx   Std. Err.     [95% Conf. Interval]

-------------+------------------------------------------------ 0.high_mpg | (base outcome) -------------+------------------------------------------------ 1.high_mpg | high_price | (1 vs 0) | .1948134 .2277168 -.2593534 .6489802


Note: dy/dx for factor levels is the discrete change from the base level.

. logit foreign i.high_mpg##i.high_price, nolog;

Logistic regression Number of obs = 74 LR chi2(3) = 18.67 Prob > chi2 = 0.0003 Log likelihood = -35.697459 Pseudo R2 = 0.2073


        foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------------+---------------------------------------------------------------- 1.high_mpg | 2.233592 .7543524 2.96 0.003 .7550886 3.712096 1.high_price | 1.054937 .8342486 1.26 0.206 -.58016 2.690034 | high_mpg#high_price | 1 1 | .5545007 1.447747 0.38 0.702 -2.283031 3.392032 | _cons | -2.233592 .6074929 -3.68 0.000 -3.424256 -1.042928


. margins high_mpg#high_price;

Adjusted predictions Number of obs = 74 Model VCE : OIM

Expression : Pr(foreign), predict()


                |            Delta-method
                |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------------+---------------------------------------------------------------- high_mpg#high_price | 0 0 | .0967742 .0531003 1.82 0.068 -.0073005 .2008489 0 1 | .2352941 .1028794 2.29 0.022 .0336543 .436934 1 0 | .5 .1118034 4.47 0.000 .2808694 .7191306 1 1 | .8333333 .1521452 5.48 0.000 .5351343 1.131532


. margins high_price, dydx(high_mpg);

Conditional marginal effects Number of obs = 74 Model VCE : OIM

Expression : Pr(foreign), predict() dy/dx w.r.t. : 1.high_mpg


         |            Delta-method
         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- 0.high_mpg | (base outcome) -------------+---------------------------------------------------------------- 1.high_mpg | high_price | 0 | .4032258 .1237725 3.26 0.001 .1606361 .6458155 1 | .5980392 .1836636 3.26 0.001 .2380652 .9580132


Note: dy/dx for factor levels is the discrete change from the base level.

. margins r.high_price, dydx(high_mpg);

Contrasts of conditional marginal effects Number of obs = 74 Model VCE : OIM

Expression : Pr(foreign), predict() dy/dx w.r.t. : 1.high_mpg


         |         df        chi2     P>chi2

-------------+---------------------------------- 0b.high_mpg | high_price | (omitted) -------------+---------------------------------- 1.high_mpg | high_price | 1 0.77 0.3791



         |   Contrast Delta-method
         |      dy/dx   Std. Err.     [95% Conf. Interval]

-------------+------------------------------------------------ 0.high_mpg | (base outcome) -------------+------------------------------------------------ 1.high_mpg | high_price | (1 vs 0) | .1948134 .2214768 -.2392731 .6288999


Note: dy/dx for factor levels is the discrete change from the base level.

. logit foreign i.high_mpg if high_price == 0, nolog;

Logistic regression Number of obs = 51 LR chi2(1) = 10.46 Prob > chi2 = 0.0012 Log likelihood = -23.718984 Pseudo R2 = 0.1807


 foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- 1.high_mpg | 2.233592 .7543524 2.96 0.003 .7550885 3.712096 _cons | -2.233592 .6074929 -3.68 0.000 -3.424256 -1.042928


. margins high_mpg;

Adjusted predictions Number of obs = 51 Model VCE : OIM

Expression : Pr(foreign), predict()


         |            Delta-method
         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- high_mpg | 0 | .0967742 .0531003 1.82 0.068 -.0073005 .2008489 1 | .5 .1118034 4.47 0.000 .2808694 .7191306


. margins, dydx(high_mpg);

Conditional marginal effects Number of obs = 51 Model VCE : OIM

Expression : Pr(foreign), predict() dy/dx w.r.t. : 1.high_mpg


         |            Delta-method
         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

1.high_mpg | .4032258 .1237725 3.26 0.001 .1606361 .6458155

Note: dy/dx for factor levels is the discrete change from the base level.

. logit foreign i.high_mpg if high_price == 1, nolog;

Logistic regression Number of obs = 23 LR chi2(1) = 6.83 Prob > chi2 = 0.0090 Log likelihood = -11.978475 Pseudo R2 = 0.2219


 foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- 1.high_mpg | 2.788093 1.235687 2.26 0.024 .3661903 5.209995 _cons | -1.178655 .5717719 -2.06 0.039 -2.299307 -.0580027


. margins high_mpg;

Adjusted predictions Number of obs = 23 Model VCE : OIM

Expression : Pr(foreign), predict()


         |            Delta-method
         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+---------------------------------------------------------------- high_mpg | 0 | .2352941 .1028794 2.29 0.022 .0336543 .436934 1 | .8333333 .1521452 5.48 0.000 .5351343 1.131532


. margins, dydx(high_mpg);

Conditional marginal effects Number of obs = 23 Model VCE : OIM

Expression : Pr(foreign), predict() dy/dx w.r.t. : 1.high_mpg


         |            Delta-method
         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

1.high_mpg | .5980392 .1836636 3.26 0.001 .2380652 .9580132

Note: dy/dx for factor levels is the discrete change from the base level.

I think there might be a way to use suest to combine the two subsample logit models and compare the cross-equation marginal effects, but I am not sure how to do that immediately.

You can also get OR results like this:

. logit foreign i.high_mpg##i.high_price, or nolog;

Logistic regression Number of obs = 74 LR chi2(3) = 18.67 Prob > chi2 = 0.0003 Log likelihood = -35.697459 Pseudo R2 = 0.2073


        foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------------+---------------------------------------------------------------- 1.high_mpg | 9.333333 7.040623 2.96 0.003 2.1278 40.93952 1.high_price | 2.871795 2.395791 1.26 0.206 .5598088 14.73218 | high_mpg#high_price | 1 1 | 1.741071 2.520631 0.38 0.702 .1019747 29.7263 | _cons | .1071429 .0650885 -3.68 0.000 .0325735 .3524213


Note: _cons estimates baseline odds.

All these models indicate that the effect of high MPG is not moderated by heaviness to a significant degree.

dimitriy
  • 35,430