0

Unlike the discrete case, the probability of any particular point in a continuous probability is zero. We must integrate over a small range of the pdf to bring a non-zero value.

In a machine learning model that assumes continuous data (as is oftend done for sound or images), the probability of any particular training data is zero.

Some models (such as variational inference models) are evaluated by their log likelihood.

My question: if the data is continuous, how can the likelihood be non-zero? The likelihood is assumed to factor over the data points, and the probability of each data point is zero, ...

  • 4
    With continuous variables, likelihood is defined in terms of probability density functions. – Tim Jun 13 '18 at 08:10
  • IN machine learning you will certainly have data that are granular to the level of the last significant figure. You do not need to deal with continuous models if you don't want to. – Michael Lew Jun 13 '18 at 08:18

1 Answers1

2

The probability p(X|theta) is indeed zero, but the likelihood function is the probability density. That is in general non-zero.

  • Can you explain a little more? So you just evaluate the pdf and ignore the fact that it is an infinitely thin sample that integrates to zero? – matchingmoments Jun 14 '18 at 10:02
  • 1
    You can think of the observation as not being Y=y but rather something like (y-d)<Y<(y+d) where d is sufficiently small that the density function doesn't change within the range. Then it doesn't matter what d is since you are not interested in the probability of the observation given different models, rather you are interested the ratio between the different probabilities. And that ratio won't be affected by d. So we say that the likelihood of the observation is the density function, rather than the density function multiplied by some arbitrarily chosen small width of the observation range. – Helene Hoegsbro Thygesen Jun 15 '18 at 04:31