In logistic regression: points above the line labeled as positive and below as negative

Question

I have two primary questions on logistic regression from a geometric point of view.

Why points above the line is labeled as positive and below as negative?
Why we write value of y as (w^t * x) / ||w||, where (w^t * x) / ||w|| is the distance between the point and the line?

Are you sure this is for logistic regression? This looks more like SVM. — user2974951, Jan 07 '22 at 07:29
Yes, this is for logistic regression. I got concepts from Krish naik tutorial on logistic regression — F.C. Akhi, Jan 07 '22 at 07:44
I don't see where this has anything to do with logistic regression. — Frank Harrell, Jan 07 '22 at 13:19
Can it be that this plot refers to linear discriminant analysis (LDA)? In the case of two class LDA, there is only one direction $\vec{w}$ and the decision is based on the projecion on this direction. Formally, a decision criterion $P(Y=1|x)>0.5$ in logistic regression can be written in the same way, but this is quite unusual as pointed out by @Demetri-Pananos. — cdalitz, Jan 07 '22 at 13:33

score 3 · Answer 1 · answered Jan 07 '22 at 08:41

Why points above the line is labeled as positive and below as negative?

In short, because above/below that line is the space where the model would predict positive/negative classes.

What you've likely plotted is the decision boundary for the learned model. The colors correspond to the predictions the model would make rather than the actual classes.

In classification problems, it is common to assign the positive class to points where the predicted probability is greater than 0.5. The decision boundary is the set of points $(x_1, x_2)$ where

$$ D(x_1, x_2) =\hat{\beta}_0 + \hat{\beta}_1x_1 + \hat{\beta}_2x_2 = 0 $$

Note that the $D(x_1, x_2)$ operates on the log odds scale. Hence, we assign the positive class to points where $D(x_1, x_2)>0$.

We can create our own version of your picture in R, and demonstrate this. Note that colors correspond to predicted class membership while shapes correspond to actual class membership. The decision boundary is not plotted explicitly but you can see where it might be implicitly. Note that there are different shapes in each color, indicating that the model makes some classification errors in the training set.

library(tidyverse)
n = 500
x1 = runif(n)
x2 = runif(n)
eta = 0.125 +  x1 - x2
p = plogis(eta)
y = rbinom(n, 1, p)
d = tibble(x1, x2, y)
model = glm(y~., data = d, family = binomial())
d$est_p = predict(model, type='response')
d$ypred = as.numeric(d$est_p>0.5)
d %>% 
  ggplot(aes(x1, x2, color=factor(ypred), shape=factor(y)))+
  geom_point()+
  theme(aspect.ratio = 1)

Why we write value of y as (w^t * x) / ||w||, where (w^t * x) / ||w|| is the distance between the point and the line?

We don't write $y=w^T\mathbf{x}$, but the decision boundary can be written as a linear combination of the features. It it is common to consider a row of a design matrix to have an additional 1 appended to the start of the vector. This means $\mathbf{x} = (1, x_1, x_2)$. If we consider $w = (\beta_0 , \beta_1 , \beta_2)$ then $D(x_1, x_2) = w^T \mathbf{x}$. This means the positive class would be assigned when $w^T\mathbf{x}>0$. I'm not sure where you've seen these equations, but I don't typically see $w$ written as a unit vector.

Logistic regression is a probabiilty model, not an arbitrary threshold decision device. — Frank Harrell, Jan 07 '22 at 13:18
@FrankHarell I agree, and if you look at my post history you will see I typically espouse that view, but people often treat it as a classifier and so to answer OPs question I’ll have to work within that framework. — Demetri Pananos, Jan 07 '22 at 14:03
I don't think it's appropriate to treat a probability model as a classifier. This will propagate misunderstandings IMHO. — Frank Harrell, Jan 07 '22 at 16:49
@Frank-Harrell Almost every classification algorithm computes a probability (or a confidence value). For instance, the kNN classifier yields a (non-parametric) estimate of $P(\omega_i|x)$. Nevertheless, I would not think that treating a kNN classifier as a classifier "propagates misunderstandings". Logistic regression can be used as a linear classifier, although it uses an inverted model compared to the usual Bayesian approach to classification: logistic regression does not consider the response non-random and the predictors randomly dependant on the response, but vice versa. — cdalitz, Jan 09 '22 at 11:00
The misunderstanding stems from improper wording. If you are estimating probabilities this should be called a probability model, probability machine, or prediction. If the output is directly a forced categorical output then classification is the proper terminology. Classification is an action and a classifier is a method that classifies. Logistic regression is not a classification method. Logistic regression is a forward predictive mode model, and we are almost always interested in Y | X. This is one reason discriminant analysis which uses X | Y is largely obsolete. — Frank Harrell, Jan 09 '22 at 12:59

In logistic regression: points above the line labeled as positive and below as negative

1 Answers1