Questions tagged [loss-functions]

A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.

Examples include:

  • The (Root) Mean Squared Error, , used in "ordinary" regression or ordinary (OLS)
  • The Mean Absolute Error, , frequently used in forecasting
  • "Hinge" losses, or linear losses where over- and underpredictions are weighted differently, for
  • (Proper) , used to compare predictive densities to actuals
1164 questions
51
votes
1 answer

What is the difference between a loss function and an error function?

Is the term "loss" synonymous with "error"? Is there a difference in definition? Also, what is the origin of the term "loss"? NB: The error function mentioned here is not to be confused with normal error.
37
votes
3 answers

Gradient of Hinge loss

I'm trying to implement basic gradient descent and I'm testing it with a hinge loss function i.e. $l_{\text{hinge}} = \max(0,1-y\ \boldsymbol{x}\cdot\boldsymbol{w})$. However, I'm confused about the gradient of the hinge loss. I'm under the…
brcs
  • 533
21
votes
1 answer

Choosing between loss functions for binary classification

I work in a problem domain where people often report ROC-AUC or AveP (average precision). However, I recently found papers that optimize Log Loss instead, while yet others report Hinge Loss. While I understand how these metrics are calculated, I am…
Josh
  • 4,448
10
votes
1 answer

The advantage of log-loss metric over f-score

I think both of log-loss and f-score can handle the unbalanced data and the f-score is normalized and more interpretable than log-loss. However, is there any advantage for log-loss over f-score?
cyberic
  • 243
  • 2
  • 10
9
votes
1 answer

Multi categorical Dice loss?

What is the formulation for the Dice loss with multiple categories. I know this is the Dice loss for binary classes. $L_{Dice} = -\frac{2 \sum_i p_{ij} y_{ij}}{\sum_i p_{ij} + \sum_i y_{ij}}$
Char
  • 241
  • 3
  • 6
8
votes
1 answer

Loss function that penalizes wrong sign predictions

Consider the following loss function: $$\mathcal L (y, \hat y) = |y| \left[\log (1 + |y - \hat y|^2) \mathbf 1 _{\{y\hat y \geq 0\}} + |y - \hat y|^2\mathbf 1 _{\{y\hat y < 0\}} \right ]$$ The idea is to penalize predictions if their sign is…
vladkkkkk
  • 691
8
votes
1 answer

Sign aware loss function

I want to create a regression model with the following properties: prediction should be close to target target and prediction should have the same sign small penalty if either target or prediction are close to 0 extra penalty if both are far from…
vladkkkkk
  • 691
4
votes
1 answer

Conditional Variance is the Best Predictor for Which Loss Function?

We know that in a prediction task of predicting $Y$ given $X$, $g(x) = E[Y|X=x]$ is the best predictor if the loss function is mean squared loss (albeit not the only one), $E[(y-g(x))^2]$. For which expected loss function (or functions), conditional…
3
votes
2 answers

Minimizing the expected loss

I was wondering about the motivation behind the following definition of expected loss: $$E[L] = \sum_{k} \sum_{j} \int_{R_{j}} L_{kj} p(x, C_{k})dx$$ where $L_{kj}$ is the loss matrix, in which $j$ is the predicted class and $k$ the true class,…
r_31415
  • 3,331
3
votes
1 answer

Understanding contrastive loss, math VS implementation

I'm working on unsupervised learning techniques and I've been reading about the contrastive loss function. Specifically in this paper Momentum Contrast for Unsupervised Visual Representation Learning they describe the loss function mathematically…
Brian
  • 141
2
votes
1 answer

Minimalizing absolute cost function error

I got two questions. 1. I know that in predictive analytics contests, when faced to yes/no problems, with the absolute cost function $f(x) = \frac{1}{n}\sum_{i=1}^n\lvert x_i-\hat x_i\rvert$ the best approach is to give $1\ if\ \hat x_i>=0.5$ or $0\…
2
votes
1 answer

Loss surface visualisation/intuition

I'm trying to wrap my head around a loss surface in pytorch. This is for work, not a homework assignment. let's say we have a model y = model(x) error = y - y_label The most simple of loss functions, absolute error …
2
votes
1 answer

Stability with L1 vs L2 norms

I've been looking over http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/ and trying to get a deeper understanding of a stable vs unstable solution for L1 vs L2. Seemingly they show L1 generally has a larger…
gitness
  • 23
2
votes
1 answer

Should loss function be defined over output or parameters?

In machine learning loss is usually defined over the actual output and the predicted output $L(Y,\hat{Y}(X))$, while in statistics it's defined in the parameter space $L(\theta,\hat{\theta}(X))$. Why? I assume one reason is that we only assume…
2
votes
1 answer

Risk function or expected loss function

I was going through the following text and actually i could not resources to understand the same online. If anyone could explain or point me out to any resource to understand the same that would be helpful. To get Risk function corresponding to…
xyz
  • 123
1
2