In the realm of statistical estimation, it is well-established that in sufficiently large samples, the influence of rare events on estimated coefficients is negligible. However, in smaller samples, these rare events can exert a noticeable effect on estimated coefficients.
It is essential to recall that Maximum Likelihood Estimation (MLE) estimators in Logit regression are only asymptotically unbiased, implying that bias is present, particularly in small sample sizes. As the sample size (denoted as `$n$') increases, it can be demonstrated that MLE estimators progressively exhibit more bias as events become increasingly rare.
Considering a Logit regression, the MLE estimator `$\beta$' seeks to maximize the likelihood function:
$$
L(\beta | y) = \sum_{y_t = 1} \log(p_t) + \sum_{y_t = 0} \log(1 - p_t)
$$
where $p_t = \frac{1}{1+\exp(-X_t\beta)}$. This formulation implies that the MLE estimator $\hat{\beta}_{MLE}$ aims to maximise $p_t$ when $y_t = 1$ and minimise $p_t$ when $y_t = 0$.
In large samples, accurate estimation of the population distribution leads to a consistent estimator. However, in small samples, when $p_t$ is extremely small (indicating rare events), the probability of sampling fewer $y_t = 1$ instances than expected tends to be higher than 0.5. That is to say
$$
Pr(\sum_{t=1}^n y_t \leq nE(y_t)) > 0.5
$$
Consequently, MLE estimators tend to 'care more' about samples with $y_t = 0$, underestimating $p_t$. As $p_t$ is positively correlated with $\beta$, the MLE estimator also underestimates the true value of $\beta$. In other words, $\hat{\beta}_{MLE} < \beta$.