5

Let's consider logistic regression for binary classification, with label 0 or 1. The loss function is -ylog(x) - (1-y)log(1-x), where x is predicted probability for label 1 and y is the label. In sklearn, logistic regression only take discrete labels. Why can't y be a continuous value between [0, 1]? Theoretically, Is there any mathematical problem if I label my samples like 0.75 being label 1 and 0.25 chance being label 0?

The question is mostly inspired by the implementation of logistic regression in sklearn, which does not take continuous input. For continuous input, every distinct number is considered as a class https://github.com/scikit-learn/scikit-learn/blob/0d378913b/sklearn/linear_model/_logistic.py#L1517.

Sean
  • 51
  • See https://stats.stackexchange.com/q/70054/17230 - yes, but there are some caveats w.r.t. inference about coefficient estimates. – Scortchi - Reinstate Monica May 11 '22 at 23:22
  • This is sometimes called fractional response regression, see https://stats.stackexchange.com/questions/466723/whats-the-difference-between-logistic-regression-and-fractional-response-model/467065#467065 – kjetil b halvorsen Sep 10 '22 at 21:31

1 Answers1

0

of course it can. you have your inputs $x_{1,k}$, $x_{2,k}$, ..., $x_{n,k}$ and your output is: $y_k=1/(1+\exp(\beta_0+\beta_1x_{1,k}+\beta_2x_{2,k}+...+\beta_nx_{n,k}))$, where $k$ are different measurements and $\beta_i$ are unknown parameters. You can now train with any $y_k \in [0,1]$, not only 0/1 labels (0,1 values). Newton iterations + partial derivatives will find you your optimal values of parameters $\beta_i$. Your optimization function can be for example $J=\sum_k{\large(y_k-1/(1+\exp(\beta_0+\beta_1x_{1,k}+\beta_2x_{2,k}+...+\beta_nx_{n,k}))\large)^2}$

  • I read here that if you use least squared error loss with the logistic function, the loss is not convex and you might fall into local optimum. Do you have any reference of proof that least square error works for logistic regression? – Sean Oct 27 '21 at 03:17
  • local minimums can be dangerous - use momentum in your optimization? sorry, I wasn't thinking about this problem that much - so not sure about convexity and if there is a proof. but you brought up valid point. but yeah - if local minimums are problem, usually people just use momentum to "swing" out of them. –  Oct 27 '21 at 16:31