I am currently working on a logistic regression problem with an imbalanced dataset. The total number of rows in my input is 51220 (class_0=49,654, class_1=1,566). I use 3 predictors (1 continuous and 2 binary). I ran logistic regression with default parameters (glm in R), but the model's predictions were all below 0.5. I though that this might be because of the imbalanced dataset. So, I estimated class weights as sci-kit learn does (inversely proportional to class frequencies in the dataset) and incorporated them into the logistic regression model. This resulted in class_0_weight=0.52 and class_1_weight=16.35. After using weights, the model predicts values ranging from 0.02 to 0.97, which seems more reasonable.
However, although the coefficient estimates didn't change much, their statistical significance drastically changed. All the coefficients are way more significant. For example, the p-value of one of the coefficients was at 3.56e-21 before using class weights. With class weights in the mode, the p-value plummeted to an extremely low value of 1.4e-136. Such a big difference in the p-values (and subsequently in the standard error and confidence interval) doesn't seem right to me.
Do you think there is a problem with the model, or the way I calculate the class weights? Do you have any suggestions on how to address the imbalance data issue differently? Thank you!