9

I've been using logistic regression for a specific problem and the loss function the paper used is the following : $$ L(Y,\hat{Y})=\sum_{i=1}^{N} \log(1+\exp(-y_i\hat{y}_{i}))$$ Yesterday, I came accross Andrew Ng's course (Stanford notes) and he gave another loss function that was intuitive, according to his saying. The function was : $$J(\theta)=\frac{−1}{N}\sum_{i=1}^{N}y^{(i)}\log(h_\theta(x^{(i)}))+(1−y^{(i)})\log(1−h_\theta(x^{(i)}))$$ Now I know there isn't only ONE loss function per model and that both could be used.

My question is more about what separates those two functions ? Is there any advantage of working with one instead of the other ? Are they equivalent in any way ? thanks !

Sycorax
  • 90,934
mlx
  • 281

2 Answers2

10

With the sigmoid function in logistic regression, these two loss functions are totally same, the main difference is that

  • $y_i\in\{-1,1\}$ is used in first loss function;
  • $y_i\in\{0,1\}$ is used in the second loss function.

Two loss functions can be derived by maximizing likelihood function.

Marks
  • 366
  • 2
  • 6
0

This is related to the choice of the labels, and each choice has (arguably) some advantages over the other. You should visit here for more detailed information on the topic.