1

I have this problem with logit, that when I want to create confusion matrix, it simply displays the real values in the first row and in the second row, there are never any numbers. I created a lot of models (individually for each country where I analyse occurence of an event 1-it happens, 0-it does not happen), but each logit has this problem.

I guess I am doing something wrong. So far, I checked for all the necessary assumptions, the only thing I didn't do, was k-fold cross validation and I did not divide model on training and testing set. Could that be a reason? Could someone explain why?

Maria R
  • 11
  • 1
    Related CV posts to help illustrate that logistic regression makes probabilistic predictions (probabilities between 0 and 1) and how to convert them to 0/1 labels: here, here and here. – dipetkov Apr 24 '22 at 10:41
  • 1
    Logistic regression is not meant to be used as a classifier, and the choice of accuracy measures should reflect that. See https://www.fharrell.com/post/mlconfusion/ and use proper continuous accuracy scores. Analysis of binary outcomes is all about estimating tendencies (probabilities) not about forced choice classification. – Frank Harrell Apr 24 '22 at 12:00

1 Answers1

0

The inverse logit function produces continuous values strictly between 0 and 1, while a confusion matrix is based on predictions in $\{0, 1\}$.

So either you dichotomize the continuous predictions (which involves choosing a threshold), or you work with scoring measures like the logloss or AIC that can directly work with continuous predictions.

Michael M
  • 11,815
  • 5
  • 33
  • 50