4

In continuous probability, $\text{Pr}(X=x) = 0$ for all $x$. However, $\text{Pr}(a \leqslant X \leqslant b) >= 0$ given $b > a$. The $>$ is key here: each individual outcome has zero probability but the probability that an outcome occurs in a SET of outcomes can be non-zero.

That the probability of each element of a set is zero but the cumulative probability of the set can exceed zero is quite paradoxical. What is this called and how is it resolved? Is it simply a property of probability limits by assumption?

(Posting this here rather than MathExchange as it is a statistical theory question)

socialscientist
  • 761
  • 5
  • 15

3 Answers3

7

This is a consequence of the properties of integration, which is a large sum of infinitesimally small areas. When dealing with a continuous random variable $X$, the probability that it falls within an interval is given by the integral of the probability density over that interval:

$$\mathbb{P}(a \leqslant X \leqslant b) = \int \limits_a^b f_X(x) \ dx.$$

One of the properties of the integral is that if you integrate over a single point, you get an infinitesimal value, which is indistinguishable from zero within the normal number system. To understand this property, you would need to learn a bit about integrals and infinitesimals, which is something covered in calculus courses (though coverage of explicit use of infinitesimal methods is something usually reserved for specialist courses). One interpretation for the phenomena you are examining is that an integral is a large sum of infinitesimally small areas with base length $dx$ and height $f_X(x)$ at each point $x$. Taking $dx$ to be an infinitesimally small quantity, we can then think about the integral explicitly as a sum of infinitesimally small quantities:

$$\mathbb{P}(a \leqslant X \leqslant b) = \sum_a^b f_X(x) \cdot dx.$$

One of the properties of the integral is that if you integrate over a single point, you get an infinitesimally small value, which is indistinguishable from zero within the standard number system:

$$\mathbb{P}(a \leqslant X \leqslant a) = f_X(a) \cdot dx = 0.$$

(The second equals sign here reflects the transition from considering infinitesimal quantities to transferring back to the domain of the standard number system.) The resolution of the issue you are considering comes from recognising that large sums of infinitesimals can be non-infinitesimal. When looking at these from the perspective of the standard number system (where infinitesimals are indistinguishable from zero) this means that an integral can be zero over a point but non-zero over a larger interval.


Note for mathematics pendants: In the above exposition I am attempting to give an intuitive answer for a non-specialist, so I am glossing over the transition between standard calculus treatment within the real number system (e.g., using the Reimann integral) and the treatment of same within a hyperreal number system that includes explicit infinitesimals. See Keisler (2022) for technical details building up the integral with explicit infinitesimals (esp. pp. 59-64).

Ben
  • 124,856
  • 1
    Nice response. Yes, I've taken a full calculus sequence awhile back and bust out a textbook for a refresher but that transition from the Reimann integral was what I was looking for. – socialscientist Sep 26 '22 at 09:42
  • I'm not sure there is anything intuitive about a "large sum of infinitesimals" ;-) – Stef Sep 26 '22 at 12:18
  • Note that also the "lack of uncountable additivity" response from @federicopoloni is quite useful for directing to more background on this. This post has a nice high level description of the traditional measure theoretic assumptions that underlie probability that lead to this https://math.stackexchange.com/questions/834140/is-the-exclusion-of-uncountable-additivity-a-drawback-of-lebesgue-measure – socialscientist Sep 26 '22 at 18:42
  • I like infinitesimals, but can't help feeling the situation is too simple to need them-- which is why I commented with a link to Zeno in the question. Following him, if we assume only that a nonnegative, finite, finitely additive measure $\lambda$ is "divisible" in the sense that on any interval $[a,b]$ there exist $a\lt c\le d\lt b$ for which $2\lambda([a,c))\le\lambda([a,b])$ and $2\lambda((d,b])\le\lambda([a,b]),$ we can--without resort to infinities or infinitesimals--demonstrate that for any $x$ that ${x}$ is not measurable or has measure less than any positive number. – whuber Sep 26 '22 at 19:28
  • @whuber: "...or has measure less than any positive number". Hmmm, that sounds like a familiar concept I have heard of. ; ) – Ben Sep 26 '22 at 22:14
  • Exactly--but it is elementary and doesn't require infinitesimals, although (in the way I phrased it) points in that direction. – whuber Sep 26 '22 at 23:09
  • @whuber: Well, what I was hinting at is that, IMHO, that essentially is an infinitesimal. In any case, thanks for the heads up! – Ben Sep 26 '22 at 23:34
3

What is this called and how is it resolved?

It's called "lack of uncountable additivity". The property that for disjoint sets $A_i$ $$ P\left[\bigcup_{i\in S} A_i\right] = \sum_{i\in S} P[A_i] $$ (additivity) is required among the axioms of probability only when $S$ is a finite or countable family of sets.

Why that? Because we only need countable additivity to prove all results that you know in probability. Requiring uncountable additivity would give you a more limited theory, since you cannot include in your theory continuous probability measures: it would not be possible to treat models such as the Lebesgue measure on $[0,1]$ and Gaussian distributions. These are arguably only abstract models, but they have a role in approximating real-world experiments.

So it's not a bug, it's a feature that allows for a richer probability theory and more setups that satisfy the axioms of probability.

Federico Poloni
  • 419
  • 3
  • 14
1

I can't really say much about the mathematics of infinitesimals and the probabilities of points in a continuous space, but a trivial answer is that even though the probability of a point in that continuous space is defined as being zero, as long as there is an integral of your set of points (i.e. the points are contiguous) that integral will be non zero.

Yes, that answer is probably not helpful, but this next bit might be. The question is only relevant in theory because in practice you cannot observe or record or compare any result with infinite precision. In practice the real-world analogue of a continuous distribution is unavoidably granular and the probability of each point in that granular distribution will be non-zero.

Michael Lew
  • 15,102
  • 1
    But you could take the integral of X from a to a where X is a continuous random variable and still suffer the same no? – socialscientist Sep 25 '22 at 21:15
  • @socialscientist Sorry, I don't know. I inhabit in the practical world where infinities are not a problem. – Michael Lew Sep 25 '22 at 21:23
  • Sure, this is a statistical theory question. However, I have no idea what you mean by "infinities" here since we're already assuming uncountable sets with inference on $\beta$ in the simple linear regression $y \sim N(\beta X, \sigma)$. – socialscientist Sep 25 '22 at 21:43
  • 1
    I think the most important word in that answer is precision. If I draw a random real number $X$ uniformly between 0 and 10, then the event $X = \pi$ has probability 0, but all the events $| X - 3 | < 1$, $| X - 3.1 | < 0.1$, $| X - 3.14 | < 0.01$, $| X - 3.142 | < 0.001$, etc., have nonzero probability. – Stef Sep 26 '22 at 12:23
  • 1
    @socialscientist The reason that points under a continuous distribution are counted as having zero probability is that the total probability is finite (one) and the number of points contributing to that total is infinite. Many mathematical difficulties in dealing with probabilities and the like come from the presence of an infinity as a divisor, and that's why i wrote "infinities". – Michael Lew Sep 26 '22 at 20:41