12

For some continuous quantities (e.g. daily rainfall at a certain location), there is one exact value that occurs often (in the case of daily rainfall that's the value of zero: there are days on which it does not rain). However, for continuous distributions, the chance of finding some other exact number (say 3.2000... mm) is zero. I remember from statistics class that an exact number that does occur more often has a name. But I can't remember it! Can anyone help?

  • 8
    "A mixture between a point mass and a continuous distribution"? In this particular case, "zero inflation"? – Stephan Kolassa Jan 22 '24 at 15:05
  • 11
    "Point mass" is one term. – whuber Jan 22 '24 at 15:06
  • A different term may apply depending on why the value appears so often. Is it an upper/lower limit indicating an extreme of the range? Is it a dummy value representing a measurement failure? Is it a real effect? – Nuclear Hoagie Jan 22 '24 at 16:49
  • 4
    "probability atom" (or "atom") for a probability distribution. https://math.stackexchange.com/questions/4452774/if-a-random-variable-has-an-atom-at-zero-does-it-have-a-density https://math.stackexchange.com/questions/1710492/difference-between-the-support-of-a-discrete-random-variable-and-the-atoms-of-it – Curious_Reader Jan 22 '24 at 21:20
  • Not what you're asking for, but if like in your example, you have a continuous variable with a point mass at zero, you could try and fit a Tweedie distribution. For discrete processes, you could consider zero-inflated Poisson, zero-inflated negative binomial, etc. – Frans Rodenburg Jan 25 '24 at 17:38

5 Answers5

19

The common term here is a "point mass" in an otherwise continuous distribution. In the specific case where there is a point mass at zero we refer to the distribution as being zero inflated.

Ben
  • 124,856
15

I can't find a good technical reference for this, but I would call this a point mass (this has also been pointed out in comments). For example, this introduction to Bayesian statistics gives the example:

This time, the Bayesian believes that the probability p of an RU-486 baby is uniformly distributed between 0 and one-half, but has a point mass of 0.5 at one-half. That is, she believes there is a 50% chance that no difference exists between standard therapy and RU-486. But if a difference exists, she thinks that RU-486 is better, but she is completely unsure about how much better it would be.

Or, Trong et al. say:

Hence, the distribution of fitness effects of the mutations is a mixture of the point-mass and the continuous distribution.

In physics (since the question was originally asked on Physics SE) a point mass is sometimes referred to as (a multiple of) a Dirac delta function (there is an example/section on "probability theory" in the linked Wikipedia article ...)


Trong, Dang Duc, Nguyen Hoang Thanh, Nguyen Dang Minh, and Nguyen Nhu Lan. “Density Estimation of a Mixture Distribution with Unknown Point-Mass and Normal Error.” Journal of Statistical Planning and Inference 215 (December 1, 2021): 268–88. https://doi.org/10.1016/j.jspi.2021.04.002.

Ben Bolker
  • 43,543
9

As a comment noted, these situations are not well described by a continuous distribution, but by a mixed one (part discrete, part continuous)

$$F_R(r) = \Pr(R\leq r) = \begin{cases} {\rm p}_0 & r=0 \\ F_{Rc}(r) & r>0, \end{cases}$$

where ${\rm p}_0$ is the strictly positive probability that the r.v. $R$ will take the value $0$, and $F_{Rc}(r)$ has the properties of a distribution function, except that its range is $({\rm p}_0, 1]$. The whole has a continuous graph, since $\lim_{r\downarrow 0} F_{Rc}(r) = {\rm p}_0$.

Sometimes for convenience we take a proper distribution function of a continuous random variable $G$, ranging in $[0,1]$, and we define something compact like $$F_{R}(r) = {\rm p}_0 + (1-{\rm p}_0) G(r).$$

In this formulation, the conceptual lines between a "mixed" distribution and a "mixture" distribution blur, since the formula above can also be seen as a "mixture" / convex combination of the degenerate distribution function (that takes the value $1$) and of $G$.

But you may also attempt to use a purely discrete distribution with perhaps a dense grid for support, if the possible inaccuracy (in case rainfall for some days is measured in between the designated values of the support), does not harm you.

5

A hurdle model has this characteristic. Here's the summary from the link:

Hurdle models were introduced by John G. Cragg in 1971, where the non-zero values of x were modelled using a normal model, and a probit model was used to model the zeros. The probit part of the model was said to model the presence of "hurdles" that must be overcome for the values of x to attain non-zero values, hence the designation hurdle model. Hurdle models were later developed for count data, with Poisson, geometric, and negative binomial models for the non-zero counts.

Nobody
  • 2,025
  • 11
  • 11
2

In a discrete distribution the value that has the highest probability of occurring is called the "mode" of the distribution. Similarly, the "mode" of a sample is the data value that occurs most often in that sample. The term "mode" can be applied to continuous distributions as well, but then it usually means a local maximum of the probability density function. The mode of a distribution is not necessarily unique - a distribution may have several modes, in which case it is a multimodal distribution.

  • 2
    Yes, but I am not looking for the mode, maybe my question was not clear. I am in fact talking about a semi-continuous distribution, in which one exact value (0.00... in the case of daily total rain) can occur more often (but not necessarily most, then it would also be the mode). There is a term for such a value, and I'm looking for that term. – Bernard Postema Jan 22 '24 at 14:58