There is indeed a bias with logistic regression and maximum likelihood estimation when the classes are not equal. Below is a demonstration by coding a simulation (example in literature here)
set.seed(1)
sim = function(n1 = 500, n2 = 20) {
data
x1 = rnorm(n1,-1,1)
x2 = rnorm(n2,1,1)
x = c(x1,x2)
y = c(rep(0,n1),rep(1,n2))
model plus correct for undersampling
mod = glm(y ~ x, family = binomial())
coefficients(mod) + c(log(n1/n2),0)
}
sims = replicate(10000,sim())
layout(matrix(1:2))
hist(sims[1,], main = "intercept estimate should be 0")
hist(sims[2,], main ="slope estimate should be 2")
rowMeans(sims)

Possibly you could proof this mathematically. Intuitively it is not unsurprising when a maximum likelihood, which is not designed to be zero bias, to be biased. If the penalties for false positives and false negatives are different then it might be good to add some bias.
In the case of a contingency table we will also get a bias
$$\begin{array}{}
& x=0 & x= 1 \\
y = 0 & 1-p_0 & 1-p_1 \\
y = 1 & p_0 & p_1
\end{array}$$
The bias does not occur in the estimates of $p_0$ and $p_1$, which is like estimating parameters of independent Bernoulli distributions, which should be unbiased. However, the slope parameter of the curve that we draw through the points will be biased.
$$\hat{\beta}_{slope} = \log\left(\frac{\hat{p}_1}{1-\hat{p}_1}\right) - \log\left(\frac{\hat{p}_0}{1-\hat{p}_0}\right)$$
Also, it could be a form of regression attenuation, the bias of a regression curve being closer towards zero, when the x-values have measurement error. When one of the categories is rare (away from odds equal to 1), then this attenuation will be stronger.
I imagine more strong bias to occur when you have complex patterns and a model is better able to learn one class and not the other class.
Another type of bias that relates to this training difference is here: Was Amazon's AI tool, more than human recruiters, biased against women?
Some recruiter tool might select based on some criteria the best men and women. But due to underrepresentation of women they might be underrepresented in the high level criteria. Then you could have a situation as in the image below. Even when on average women might be more often in the class of a good candidate, it is mostly men that could end up with a higher class probability probability because the model trained on men and got very good and seperating them.
