2

I am applying a Poisson regression with robust standard errors to model a binary response variables. I was wondering what are the assumptions underlying this type of regression?

Does robust Poisson regression is appropriate in the context of a cross sectional study?

zhiheng yi
  • 45
  • 4
  • Assumptions for what? Unbiasedness? Consistency? Correctness of the SEs? Different qualities require different assumptions. See this answer in the context of linear regression. – Noah Feb 26 '24 at 18:27

1 Answers1

1

One wouldn't ever realistically use a Poisson regression for a binary response. This seems much easier to model with a logistic regression instead, regardless of whether or not the data belongs to a cross-sectional or longitudinal study.

Some assumptions for logistic regression include:

  • Independence of errors. Basically you shouldn't have something like autocorrelation or repeated measures type data present. Though even that can be modeled with a logistic regression with adjustments (with a logistic GLMM).
  • No multicollinearity. If for example two variables are almost perfectly correlated, then things like your standard errors will be completely inaccurate. Again, there are ways around this, but something to keep an eye on.
  • Linearity in the logit. This is an assumption that I often see missed from people I have talked to in my field, but logistic regression also requires linearity, just with respect to the logit instead of a raw response like in OLS. If this is an issue, an alternative, like a logistic GAM, may be needed.
  • Enough events per predictor. A very rough rule of thumb is that you should have around 10-20 events (your binary responses) per covariate.
  • Reliability. You face attenuation bias if your predictors are unreliable, so check that this also isn't a serious problem (if for example your measures are a composite variable, say like a test of anxiety with multiple items).

A primer on logistic regression can be found here, and I greatly admire the book Practical Guide to Logistic Regression by Joseph Hilbe, which is available for free here.

  • There can be good reasons to use Poisson regression for a binary response. See Chen et al. (2018) for example. In the context of a randomized trial, for example, all QMLEs allow you to consistently estimate a treatment effect, regardless of whether the outcome is linear in the logit (log). Also, do highly correlated predictors actually yield inaccurate SEs or just accurately large ones? – Noah Feb 26 '24 at 18:33
  • I mean in the sense that the SEs are inflated, so the latter statement is my true meaning. I have never seen Poisson used in the way you describe (though I don't doubt it has been done before). I always figured logistic regression was more straightforward for binary data. Ill give that article a look. Thanks for the share. – Shawn Hemelstrand Feb 26 '24 at 18:39
  • @Noah How convincing does the Chen et al. (2018) paper seem to you? Here is their motivating example: "FeNO was categorized into four quartiles. Multivariable log-binomial and robust Poisson regression models were applied to estimate the risk ratio of having seven or more SABA canisters in each of the FeNO quartiles". This stuff has been described as poor practice 1,000x times on CV. – dipetkov Feb 26 '24 at 20:43
  • Also, it's not clear that the OP is interested in estimating relative risks which would be the reason to use modified Poisson regression in the first place. – dipetkov Feb 26 '24 at 20:53
  • @dipetkov Conditional relative risks correspond to parameters in a log linear model for a binary outcome, so it is perfectly reasonable to want a model that produces them. With rare outcomes, there are few problems with using a log linear model instead of a logistic model. The paper simply notes that there is some value in using the log link, and if you use the log link, it is okay to assume a Poisson likelihood rather than a binomial likelihood. The authors making poor modeling choices otherwise is unrelated to their argument or my view of the paper. – Noah Feb 26 '24 at 21:07
  • OP doesn't say what they want to do with the model. If it is to adjust for covariates in an RCT or PS weighted sample, then using QMLE, regardless of the link, yields a consistent estimator of the treatment effect when combined with g-computation. If it is to estimate the relative risk of one variable adjusting for others and the outcome is rare, a log linear model is fine, and Chen et al (2018) argues using a Poisson QMLE to fit that model is perfectly valid. If it is to construct a plausible model of the relationship between predictors and the outcome, no parametric model will be adequate. – Noah Feb 26 '24 at 21:11
  • @Noah The OP is trying to respond to comments from a reviewer (this is their 3rd question) and, to be honest, the reviewer doesn't seem to have made a helpful comment. This isn't about a RCT or a rare outcome. I asked about the Chen paper because I'm genuinely still trying to figure out the whole "odds ratio" vs "relative risk" modeling discussion, and was wondering whether this paper makes a solid argument. – dipetkov Feb 26 '24 at 21:31