5

In this forum, there are opposite opinions(1), (2) on the uses of logistic regression. Ones say, it is a classification model and others say it is a prediction model.

Therefore, the question that I have is:

Is Logistic Regression a classification or prediction model?

One can say that a prediction model is one that you obtain an equation to predict values.

On the other hand, classification model is that one that having a set and some technique rules you differentiate the set in two or more subset. Lets say, on molecule set that after appliying the techniques rules the molecules are differentiated into mutagenic and non mutagenic molecules.

In the case of logistic regression, you obtain an equation but the output is a binomial classification.

Therefore, the question is valid.

Richard Hardy
  • 67,272
  • 2
    How do you define the "prediction model" and how does the classification model differ from it? – Tim Jun 30 '23 at 15:29
  • Linked in the comment of the first thread you linked to: https://stats.stackexchange.com/a/127044/121522 – mkt Jun 30 '23 at 15:31
  • @Tim The question was expanded – Another.Chemist Jun 30 '23 at 15:34
  • 9
    Your initial sentence misrepresents the links. There's nothing "opposite" about two different applications or uses of the same procedure. You could just as well say that people hold "opposite opinions" about the uses of a chair when one says it's for sitting, another says it's for decorating a room, a third says it's for standing on to reach higher, etc. – whuber Jun 30 '23 at 15:40
  • Classification is nothing else than point prediction, so this is a false dichotomy. – Richard Hardy Jul 01 '23 at 10:36
  • 2
    The answers given are good. I would say "yes" or "both" or "opinion-based". (The only thing I have strong feelings about is that it is a prediction algorithm (which is what I like to use it for) - it may also be a classification algorithm. – Ben Bolker Jul 01 '23 at 23:14
  • NB. the "opposite" of classification is regression (not prediction). Both of them are predictive models. – mirekphd Jul 03 '23 at 08:48

4 Answers4

20

In the case of logistic regression, you obtain an equation but the output is a binomial classification.

This is a common misconception.

Explicitly, a logistic regression does no classification, instead returning predicted probabilities of event occurrence. However, the machine learning terminology seems to refer to problems as “classification” problems when the observed outcomes are categorical (e.g., dog vs cat), which is a major use case for a logistic regression. This terminology is at odds with the standard English definition of classification, yes, because so many of the models used for these “classification” problems do no explicit classification. What you do with the predicted probability is separate from the logistic regression.

I think this resolves the dispute.

The divergence in terminology works fine for me, someone with experience in statistics and machine learning. Where it seems to hurt people is when they are first learning and think logistic regression models explicitly make classifications, and software package having predict methods that return the category with the largest predicted probability instead of the predicted probability does not help.

Finally, logistic regression models do not have to be used for predictive modeling. All of the usual ideas from linear regression about parameter inference and causality can apply, and practitioners interested in these ideas might not be particularly interested in getting great predictive performance.

Dave
  • 62,186
10

Assuming we're all clear what logistic regression is in terms of technical implementation, one can still use it for many things. You highlight two options:

  • Prediction/giving a probability for "1" vs. "0"
  • Classification/predicting a specific class (based on some cut-points on the probabilities from the previous bullet)

However, there's other things one can use it for:

  • Inference by interpreting regression coefficients (e.g. 1 unit higher value of predictor A is associated with an increase of the log-odds of belonging to class 1 of X)
  • Causal inference (can depend a lot on the setting how exactly that is done, but e.g. as a model in the background [potentially in combination with a model for treatment assignment] for calculating the effect of an intervention in terms of a difference in probabilities)
  • ...
Björn
  • 32,022
  • 1
    I'd argue that 'what a thing is' and 'what it could be used to do' are distinct. I can use a scredriver to stir paint but if I say that 'what it is' is a paint-stirrer I'm not really describing what the point of a screwdriver is.. Classification via cut points is not part of logistic regression itself (its a model for a conditional probability), but rather a use case via something ditinct from the original design purpose, tacked onto it later. – Glen_b Jul 01 '23 at 05:40
  • That is not an objection to what you say when you're clearly describing how it's used at that point (indeed, +1), but an attempt to encourage readers to avoid putting more weight on its use cases than is merited in figuring out 'what it is' – Glen_b Jul 01 '23 at 05:47
5

The definitions you mention are imprecise and wrong. Others already commented on this, so I won't be repeating what was said just add my three cents.

One can say that a prediction model is one that you obtain an equation to predict values.

In such a case, every SVM or neural network is a prediction model, because they produce "equations to predict values". Moreover, the same applies to many other models like the naive Bayes algorithm, but even something like a decision tree may be rewritten as a complicated equation. Under this definition every machine learning model is a prediction model.

Tim
  • 138,066
3

We need to acknowledge that "classification" in particular, but probably also "prediction model" are terms that do not have an agreed precise definition, and different people, also in the literature, use them differently. So according to some handling of these terms it can only be either one or the other, and according to some other handling (with which I'd agree) it can be (and actually is) both. There is no reason why you cannot predict a class and call that classification and also prediction. However, even though this would be my handling of terms, I will acknowledge that people who use it differently are not "wrong", rather they just use terminology differently, the use of which no generally accepted authority has prescribed. (Note that I have seen the term "classification" used as encompassing both supervised and unsupervised classification, but also as referring to each one of these exclusively.)

Whenever reading scientific material it is a good idea to keep an open mind for a use of terms that differs from what you have learnt or seen in other places. Good authors are aware of this and say explicitly what they mean when using potentially ambiguous terms, even if these are in widespread use and therefore in all likelihood "known" to the readers (but the readers may not know all uses).