Like many other claimed “classification” models, logistic regressions return predictions on a continuum. That is, logistic regressions do no classification. The classification comes from a two-step pipeline of getting the continuous predictions from the logistic regression and then applying a decision rule to bucket those continuous predictions into categories, typically by applying a threshold above which the categorical predictions are category $1$ and below which the predictions are category $0$ (or $-1$, depending on how you’ve coded them). Thus, if you’re unhappy with the classification performance, there are two possible culprits: the logistic regression is doing a poor job of distinguishing between the categories or the decision rule is inappropriate despite fine performance by the regression.
Unfortunately, much data science training treats this two-stage pipeline as one stage where the decision stage is fixed as assigning to the most probable category, and all work is for the first stage. While this at first does not sound outrageous, in imbalanced problems, it’s reasonable for the minority category never to be more likely. In this case, such a decision rule will lead to no predictions of the minority category. However, nothing forces you to use that decision rule. There are reasons to look at the direct probability outputs without bucketing them, but seeing how your classification metrics perform over a range of thresholds can help you feel better about how your model is performing (at least in terms of ability to distinguish between the two categories). Maybe your precision, recall, and $F_1$ score are poor at the software-default decision rule, but how are they for other decision rules? You can answer this by looping over many thresholds, bucketing into categories according to these thresholds, and plotting the performance by threshold.
This still misses out on the richness of the probability predictions and what you can do when they are good, but you will learn more about how well the model distinguishes between categories by doing this than you will by using the software-default decision rule. Further, this neither requires you to lie to the model about the data by running techniques like SMOTE or under/overaampling, and you don’t have to fiddle with weighted loss functions whose effectiveness you cannot assess until after you’ve trained the model (though the computing time for a logistic regression is probably not so great to make this unrealistic), as assessing the performance at many thresholds is performed after fitting the model, not before.
I still encourage you to get out of the classification mindset and get into a mindset where the goal is to accurately predict probability, but seeing what happens for many possible second stages of the pipeline at reveal that your ability to classify is not as bad as you thought.
To give an idea of what the probabilities can do for your problem with lending money, the lender doesn’t decide just to lend or not to lend. The lender considers an interest rate, and a person with a higher probability of defaulting would get a loan with a higher interest rate. This is basically how credit scores work: when you have a low credit score, your default probability is assumed to be high and that you are a risk, so your interest rate is high to compensate the lender for taking that risk.
LINKS OF POSSIBLE INTEREST
Predictions vs Decisions [especially Kolassa’s answer]
Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
Why is accuracy not the best measure for assessing classification models? [Especially Kolassa’s Answer]
Proper scoring rule when there is a decision to make (e.g. spam vs ham email) [Especially Kolassa’s Answer]
Regression on imbalanced and zero-inflated data. How to deal with less frequent values? [This is where I opine that an inability to predict unusual events is expected behavior unless you have some characteristic of those events that distinguishes them from "business as usual".]
How to calculate accuracy of a logistic regression? [Data Science Stack Exchange]
Harrell’s Blog: Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules
Harrell’s Blog: Classification vs. Prediction