3

Elsewhere some have proposed a method for conducting a significance test on the equality of slope coefficients using the equation (clarification: for models fitted to separate samples rather than the same set of data):

$Z = \frac{\beta_1-\beta_2}{\sqrt{(SE\beta_1)^2+(SE\beta_2)^2}}$

following from Clogg, C. C., Petkova, E., & Haritou, A. (1995). Statistical methods for comparing regression coefficients between models. American Journal of Sociology, 100(5), 1261-1293. and is cited by Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998). Using the correct statistical test for equality of regression coefficients. Criminology, 36(4), 859-866. equation 4. Cohen, Cohen, West, and Aiken, propose something quite similar in equation 2.8.6 of Applied Multiple Regression / Correlation Analysis for the Behavioral Sciences (3rd Ed).

Question in two parts.

Part 1. How does this approach apply in the context of logistic regression? Specifically, it would seem like the outcome variable of 0s and 1s can't be Z scored in the typical way one might do in order to produce standardized slopes. If this approach won't work at all, is there one that is similarly 'easy'?

Part 2. In OLS, how does this approach apply if the predictor variables are 0 and 1? Would you still Z transform them to get a standardized slope, or because they are on the same scale should they be left as is? Are they on the same scale at all if the mean of one is not equal to the mean of the other? What would you do if just one of the predictors is binary?

russellpierce
  • 18,599
  • 1
    Since this is a test for equality of coefficients, it doesn't matter whether the predictors themselves are binary or continuous or something else. Similarly with the outcome variables... although in the case of logistic regression we are relying on asymptotics, whereas with OLS we either rely on asymptotics or make the assumption that the residuals are i.i.d. Normal (not the predictors!). – jbowman Apr 23 '19 at 22:19
  • 1
    This approach is invalid in general because it assumes the coefficient estimates are (at least approximately) uncorrelated. – whuber Apr 24 '19 at 02:16
  • 1
    I presume it's intended to be used for models fitted to separate samples rather than the same set of data – Glen_b Apr 24 '19 at 02:43
  • @whuber I'm curious, what approach is valid under more general circumstances? – russellpierce Apr 24 '19 at 03:25
  • 2
    The denominator is supposed to be the standard deviation of the difference of the estimates, $\hat\beta_1 - \hat\beta_2.$ That is given by $$\sqrt{\operatorname{Var}(\hat\beta_1) + \operatorname{Var}(\hat\beta_2) - 2\operatorname{Cov}(\hat\beta_1, \hat\beta_2)}.$$ The covariance term becomes appreciable when the estimates are correlated. – whuber Apr 24 '19 at 03:28
  • 3
    @Glen Thank you. I consulted the reference and found it does indeed refer to two independent samples. Thus the covariance is (by assumption) zero. Setting it to zero gives (obviously) the correct formula. Apparently, in one community a different formula had been prevalent and correcting that error was the point of the referenced paper. – whuber Apr 24 '19 at 03:35

1 Answers1

1

Mostly answered in comments, so see those! Here we will illustrate two approaches, see also Test if two coefficients are statistically different in logistic regression? and search this site for the keywords "logistic" and "contrasts".

Testing equality of two regression coefficients (in logistic and in other regression models) is called testing of a contrast. In your case, if the coefficient vector is $(\beta_0, \beta_1, \beta_2)$, you want to test the null hypothesis that $\beta_1=\beta_2$, define the contrast vector $c=(0, 1, -1)$ and your null becomes that $c^T \beta =c_1 \beta_0 + c_2 \beta_1 + c_3 \beta_2 =0$. This can be tested with the Wald test, which is what is discussed (and corrected) in the comments. Another way is to estimate a reduced model where it is assumed from the start that $\beta_1=\beta_2$, and compare the two models. For linear regression these two methods are equivalent, for logistic regression they are not. Below we show the two methods with some simulated data, in R. In the simulated case the results are similar, but not identical.

mod1 <- glm(Y ~ x1 + x2, family=binomial, data=mydata)

car::linearHypothesis(mod1, c(0, 1, -1)) Linear hypothesis test

Hypothesis: x1 - x2 = 0

Model 1: restricted model Model 2: Y ~ x1 + x2

Res.Df Df Chisq Pr(>Chisq) 1 98
2 97 1 0.7904 0.374

which is the Wald test. Then by model comparison:

mod0 <- glm(Y ~ I(x1 + x2), family=binomial, data=mydata)
anova(mod0, mod1, test="Chisq")
Analysis of Deviance Table

Model 1: Y ~ I(x1 + x2) Model 2: Y ~ x1 + x2 Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 98 37.680
2 97 36.879 1 0.80085 0.3708

Then simulating the data used above:

set.seed(7*11*13)
N <- 100

X <- MASS::mvrnorm(N, c(10, 10), 9*matrix(c(1, 0.5, 0.5, 1),2,2) ) beta <- c(-6, 0.5, 0.7)

X <- cbind(rep(1, N), X)

mu <- X %*% beta
p <- exp(mu)/(1+exp(mu))

Y <- rbinom(N, 1, p) mydata <- data.frame(Y, x1=X[,2], x2 = X[,3])