I'm conducting a population study analyzing a rare outcome (prevalence 2-3%) with a rare exposure (~225 exposures, 800 000+ non-exposed). This gives me 5-8 cases among the exposed. Data on 7-8 covariates are available among both exposed and controls.
My plan is to perform logistic regression, adjusted for covariates without interaction terms, to get an OR (with a confidence interval) for the outcome with respect to the exposure.
Does the low number of cases among the exposed pose a problem for the logistic regression, apart from me possibly getting very wide confidence intervals? I'm thinking problems perhaps in the form of non-convergence of estimators, creating significant bias, or something else. If so, is it still sensible to perform the regression but interpret it with care or should it not be performed at all in this setting?
Is perfect seperation always related to overfitting? I have 20 000 individuals with the rare outcome (but without exposure), and only 7-8 covariates (2-3 binary), and thought overfitting wouldn't be a problem.
Does the rare exposure add to the problem of perfect seperation or is it only dependent on how rare the outcome is?