1

I am trying to learn a logistic regression classifier using glm on a dataset of <20 features, but lots of samples. One of the features is a very strong predictor. As a result, the trained model is predicting extreme probabilities of 1.0 and 0.0 on majority of test data. Although the model converges, I get repeated warnings of such extreme predictions.

Q1. Are the posterior probabilities/predictions still valid?

Q2. How should I reduce the effect of the strong predictor so that the contributions from other variables are taken into account during inference? Right now, they are completely overpowered by the one predictor.

  • You can check whether predictions is valid or not by Cross Validation, for example. But I didn't get your point about Q2: why do you need to reduce the effect of a good predictor? In most practical cases that would mean that you reduce classification quality? – Dmitry Laptev Jun 14 '12 at 13:31

1 Answers1

0

Partially answered in comments:

You can check whether predictions is valid or not by Cross Validation, for example. But I didn't get your point about Q2: why do you need to reduce the effect of a good predictor? In most practical cases that would mean that you reduce classification quality? – Dmitry Laptev

You should still check if the predictor is legitimate, could it be that it is almost a version of the $Y$ variable? Another possibility is that there is (quasi)separation, which isn't a problem per se if the strong predictor is correct, but might be a problem if the goal is inference and not (only) prediction, since it invalidates the usual approximations. See How to deal with perfect separation in logistic regression?