In machine learning loss is usually defined over the actual output and the predicted output $L(Y,\hat{Y}(X))$, while in statistics it's defined in the parameter space $L(\theta,\hat{\theta}(X))$. Why? I assume one reason is that we only assume parametric models in statistics in this case while the ML loss is more general and covers both parametric and non-parametric cases. Is there any other reason?
-
well even in ml the second formulation makes more sense as weight decay is a thing – shimao Mar 26 '19 at 14:37
1 Answers
In machine learning loss is usually defined over the actual output and the predicted output $L(Y,\hat{Y}(X))$, while in statistics it's defined in the parameter space $L(\theta,\hat{\theta}(X))$.
That's not quite right. Loss is always defined by comparing a prediction from a potential model to the target in the data
$$L(Y, \hat{Y})$$
Sometimes our statistical model is defined by some small(ish) number of parameters $\hat \theta$ (*), which would allow us to express $\hat Y$ as a function of the data $X$ and the proposed parameters $\hat \theta$
$$\hat{Y} = \hat{Y}(X, \hat \theta)$$
which would make the loss a function of the target, the input data, and the parameters
$$L(Y, \hat{Y}) = L(Y, \hat{Y}(X, \hat \theta))$$
Notice that I did not write $L(\theta,\hat{\theta}(X))$, that would require knowing the true value of the parameter $\theta$, which you never know. You only ever have access to $Y$ and $\hat Y$.
As an addendum:
I assume one reason is that we only assume parametric models in statistics in this case while the ML loss is more general and covers both parametric and non-parametric cases.
That's not a fair characterization of statistics. Statisticians are just as interested in non-parametric models as machine learning researchers, and have arguably been studying them for just as long.
(*) I'm following the notation from the question here. $\hat \theta$ is usually used for the estimated value of the parameter $\theta$, but the poster used the symbol $\theta$ for the true value of the parameter.
- 35,629