Is probability theory the study of non-negative functions that integrate/sum to one?

Question

This is probably a silly question, but is probability theory the study of functions that integrate/sum to one?

EDIT. I forgot non-negativity. So is probability theory the study of non-negative functions that integrate/sum to one?

Update https://youtu.be/FJd_1H3rZGg?t=2382 this definition of probability helps me understand better

Yes, probabilities always sum to one. Likelihoods on the other hand do not have this constraint. — user78229, Mar 14 '18 at 11:40
The only reasonable answer to the question as stated is no, not least because there are many functions $f$ that integrate to 1 but for which $\int_{a}^b f(u) du$ cannot represent probabilities for some $a$ and $b$. For example, consider a function that's 1.5 between 0 and 1 and -0.5 between 1 and 2, and 0 everywhere else. (but it's also arguably "no" for other reasons as well) — Glen_b, Mar 14 '18 at 13:43
@Glen_b hi, good to know, it'll be awesome if you could make the “other reasons” into an answer? thanks! — dontloo, Mar 14 '18 at 13:57
Related: https://stats.stackexchange.com/questions/214485/are-cdfs-more-fundamental-than-pdfs — Ilmari Karonen, Mar 14 '18 at 17:15
There are serious papers on negative probability, e.g. Maurice S. Bartlett. https://doi.org/10.1017/S0305004100022398 — Nick Cox, Mar 14 '18 at 17:30
@dontloo what I was aiming at there is now pretty well covered by the Tao quote in Chaconne's answer. — Glen_b, Mar 15 '18 at 00:28
By "probability theory" do you intend to include alternative theories of probability, such as Bayesian (and Dempster-Shafer and the Transferable Belief Model and Dezert-Smarandache Theory), imprecise probabilities, plausibility theory, etc. ? — E. Douglas Jensen, Mar 18 '18 at 13:54
@E.DouglasJensen hi, thank you, I think I only meant the basic "probability theory" as I'm barely aware of other theories in this field, but I would love to learn the difference :) — dontloo, Mar 19 '18 at 04:04
In a similar vein I imagine music theory to be Fourier analysis of waveforms and physics to be partial differential equations. There's no better way to strip the life from a field of human endeavor than to reduce it to simplified mathematics while forgetting its meaning and the questions it attempts to answer. — whuber, Mar 23 '18 at 16:41

jld · Answer 1 · 2018-03-23T16:25:02.913

At a purely formal level, one could call probability theory the study of measure spaces with total measure one, but that would be like calling number theory the study of strings of digits which terminate

-- from Terry Tao's Topics in random matrix theory.

I think this is the really fundamental thing. If we've got a probability space $(\Omega, \mathscr F, P)$ and a random variable $X : \Omega \to \mathbb R$ with pushforward measure $P_X := P \circ X^{-1}$, then the reason a density $f = \frac{\text d P_X}{\text d\mu}$ integrates to one is because $P(\Omega) = 1$. And that's more fundamental than pdfs vs pmfs.

Here's the proof: $$ \int_{\mathbb R} f \,\text d\mu = \int_{\mathbb R} \,\text dP_X = P_X(\mathbb R) = P\left(\{\omega \in \Omega : X(\omega) \in \mathbb R\}\right) = P(\Omega) = 1. $$

This is almost a rephrasing of AdamO's answer (+1) because all CDFs are càdlàg, and there's a one-to-one relationship between the set of CDFs on $\mathbb R$ and the set of all probability measures on $(\mathbb R, \mathbb B)$, but since the CDF of a RV is defined in terms of its distribution, I view probability spaces as the place to "start" with this kind of endeavor.

I'm updating to elaborate on the correspondence between CDFs and probability measures and how both are reasonable answers for this question.

We begin by starting with two probability measures and analyzing the corresponding CDFs. We conclude by instead starting with a CDF and looking at the measure induced by it.

Let $Q$ and $R$ be probability measures on $(\mathbb R, \mathbb B)$ and let $F_Q$ and $F_R$ be their respective CDFs (i.e. $F_Q(a) = Q\left((-\infty, a]\right)$ and similarly for $R$). $Q$ and $R$ both would represent pushforward measures of random variables (i.e. distributions) but it doesn't actually matter where they came from for this.

The key idea is this: if $Q$ and $R$ agree on a rich enough collection of sets, then they agree on the $\sigma$-algebra generated by those sets. Intuitively, if we've got a well-behaved collection of events that, through a countable number of complements, intersections, and unions forms all of $\mathbb B$, then agreeing on all of those sets leaves no wiggle room for disagreeing on any Borel set.

Let's formalize that. Let $\mathscr S = \{(-\infty, a] : a \in \mathbb R\}$ and let $\mathcal L = \{A \subseteq \mathbb R : Q(A) = R(A)\}$, i.e. $\mathcal L$ is the subset of $\mathcal P(\mathbb R)$ on which $Q$ and $R$ agree (and are defined). Note that we're allowing for them to agree on non-Borel sets since $\mathcal L$ as defined isn't necessarily a subset of $\mathbb B$. Our goal is to show that $\mathbb B \subseteq \mathcal L$.

It turns out that $\sigma(\mathscr S)$ (the $\sigma$-algebra generated by $\mathscr S$) is in fact $\mathbb B$, so we hope that $\mathscr S$ is a sufficiently big collection of events that if $Q = R$ everywhere on $\mathscr S$ then they're forced to be equal on all of $\mathbb B$.

Note that $\mathscr S$ is closed under finite intersections, and that $\mathcal L$ is closed under complements and countable disjoint intersections (this follows from $\sigma$-additivity). This means that $\mathscr S$ is a $\pi$-system and $\mathcal L$ is a $\lambda$-system. By the $\pi$-$\lambda$ theorem we therefore have that $\sigma(S) = \mathbb B \subseteq \mathcal L$. The elements of $\mathscr S$ are nowhere near being as complex as an arbitrary Borel set, but because any Borel set can be formed from a countable number of complements, unions, and intersections of elements of $\mathscr S$, if there is not a single disagreement between $Q$ and $R$ on elements of $\mathscr S$ then this will be followed through to there being no disagreements on any $B \in \mathbb B$.

We have just shown that if $F_Q = F_R$ then $Q = R$ (on $\mathbb B$), which means that the map $Q \mapsto F_Q$ from $\mathscr P := \{P : P \text { is a probability measure on } (\mathbb R, \mathbb B)\}$ to $\mathcal F := \{F : \mathbb R \to \mathbb R : F \text { is a CDF}\}$ is an injection.

Now if we want to think about going the other direction, we want to start with a CDF $F$ and show that there is a unique probability measure $Q$ such that $F(a) = Q\left((-\infty, a]\right)$. This will establish that our mapping $Q \mapsto F_Q$ is in fact a bijection. For this direction, we define $F$ without any reference to probability or measures.

We first define a Stieltjes measure function as a function $G : \mathbb R \to \mathbb R$ such that

$G$ is non-decreasing
$G$ is right-continuous

(and note how being càdlàg follows from this definition, but because of the extra non-decreasing constraint "most" càdlàg functions are not Stieltjes measure functions).

It can be shown that each Stieltjes function $G$ induces a unique measure $\mu$ on $(\mathbb R, \mathbb B)$ defined by $$ \mu\left((a, b]\right) = G(b) - G(a) $$ (see e.g. Durrett's Probability and Random Processes for details on this). For example, the Lebesgue measure is induced by $G(x) = x$.

Now noting that a CDF is a Stieltjes function $F$ with the additional properties that $\lim_{x\to-\infty} F(x) := F(-\infty) = 0$ and $\lim_{x\to\infty} F(x) := F(\infty) = 1$, we can apply that result to show that for every CDF $F$ we get a unique measure $Q$ on $(\mathbb R, \mathbb B)$ defined by $$ Q\left((a, b]\right) = F(b) - F(a). $$

Note how $Q\left((-\infty, a]\right) = F(a) - F(-\infty) = F(a)$ and $Q\left((-\infty, -\infty]\right) = F(\infty) - F(-\infty) = 1$ so $Q$ is a probability measure and is exactly the one we would have used to define $F$ if we were going the other direction.

All together we have now seen that the mapping $Q \mapsto F_Q$ is 1-1 and onto so we really do have a bijection between $\mathscr P$ and $\mathcal F$. Bringing this back to the actual question, this shows that we could equivalently hold up either CDFs or probability measures as our object which we declare probability to be the study of (while also recognizing that this is a somewhat facetious endeavor). I personally still prefer probability spaces because I feel like the theory more naturally flows in that direction but CDFs are not "wrong".

+1 for a broader perspective on the matter; You correctly note that Skorokhod's càdlàg function-space is only a present notion of what probability theory entails, radically different from Borel's, and Skorokhod's discoveries only date back ~40 years or so. Who knows what the next century may uncover? — AdamO, Mar 14 '18 at 18:05
@AdamO absolutely, and there’s the weirder ones like non-Archimedean probability, where even if they never become the dominant view (and to my knowledge no one is seriously trying to do that) I find they help me to better understand the standard formulation (eg how serious of a thing sigma additivity is) — jld, Mar 14 '18 at 22:01
I read the question title and thought of that quote from Terence Tao; must have read it years ago (2010) but it's really memorable. As he goes on to say, At a practical level, the opposite is true… — ShreevatsaR, Mar 15 '18 at 19:31
See my comment on the question: How do alternative theories of probability, such as Bayesian (and Dempster-Shafer and the Transferable Belief Model and Dezert-Smarandache Theory), imprecise probabilities, plausibility theory, etc. relate to this question and discussion ? — E. Douglas Jensen, Mar 18 '18 at 14:01
@E.DouglasJensen I'm not sure, i'm addressing this in terms of the standard Kolmogorov axioms so in that context I think my answer is "right", but if we're changing the axioms then I suppose all bets are off. Also i'm not being philosophical at all about this so if we're trying to connect this to the real world in any way, e.g. with questions like "what is the probability that the sun rises", then i'm sure it gets more complicated. Nevertheless, it seems a pretty safe bet that the probability that "anything" happens is the maximal value (probably $1$) and that there's no uncertainty in that — jld, Mar 19 '18 at 16:14

AdamO · Answer 2 · 2022-07-11T20:12:42.217

16

No; the Cantor distribution is just such a counterexample. It's a random variable, but it has no density. It has a distribution function, however. I would say, therefore, that probability theory is the study of càdlàg functions, inclusive of the Cantor DF, that have left limits of 0 and right limits of 1.

EDIT: This abstraction is a necessary but not complete picture of what probability theory is. I think a more widespread and agreeable definition might be that probability theory is a study of probability spaces which are well defined and have quite a few abstract applications. The probability "function" is formally a measure with some additional constraints, which can be understood as an abstract form of "integrating/summing" things, the probability space has certain constraints on what those "things" are as well. Anyway, a rigorous definition can be found in some advanced theory books, Hogg and Craig 8th ed. give the most accessible definition in the introduction.

edited Jul 11 '22 at 20:12

answered Mar 14 '18 at 14:08

AdamO

62,637

Nice, I never heard of cadlag functions. However, these still assume a real and a metric space. Not all probability theory is done on such spaces. – HRSE Mar 15 '18 at 01:11
@HRSE cadlag functions require that the codomain is a metric space and since CDFs take values in the range of the corresponding measure, for a CDF to have values not in a metric space we'd need our probability measure also to take values in a non-metric space which is pretty weird as that rules out $\mathbb R$, $\mathbb C$, and any other space I've ever seen a probability measure have as a codomain. What would be an example of this? – jld Mar 15 '18 at 01:24
1

You may for example go back to Terrence Fine, Theories of Probability. Also note that cadlag functions (at least according to the wikipedia article) have the real numbers as a domain. LJ Savage's "Foundations of Statistics" gives an account of (subjective) probability theory on spaces that are not necessarily real. – HRSE Mar 15 '18 at 01:55
Not all cadlag functions are distribution functions - what about those which are not monotone increasing? – jwg Mar 15 '18 at 10:26
1

@jwg Some other comments in this post address negative probability, which seems to be of some use in quantum physics though my simple mind cannot fathom such a thing. – AdamO Mar 15 '18 at 13:35
1

@HRSE thanks for the references. I couldn't find either of them online but I skimmed some other papers by those authors although I didn't find any examples of this. If we're defining a random variable $X$ as $X : \Omega \to \mathbb R^n$ then the CDF is defined in terms of the pushforward measure $P_X := P \circ X^{-1}$ (not the measure $P$ on $(\Omega, \mathscr F)$) and since $X$ is real valued $P_X$ is necessarily a measure on $(\mathbb R^n, \mathbb B^n)$ which means we can feed it sets like $(-\infty, a]$ so $F$ has $\mathbb R^n$ as its domain. Am i missing something? – jld Mar 15 '18 at 15:47
@Chaconne I think put similarly, a probability space must be a measure space. The measure need not be real, but it must be well ordered; I do not know what that means in terms of a possible mapping (isomorphic?) to the real line. – AdamO Mar 15 '18 at 15:53
@AdamO what do you mean by the measure being well ordered? Do you mean totally ordered? Although complex valued measures are a thing (though beyond the definition i don't know much about them) and $\mathbb C$ is neither totally ordered nor well ordered so i'm not sure if that's a characterizing property – jld Mar 15 '18 at 19:26
@Chaconne My very simple grasp is that well ordered means that the $\le$ operator is defined in the probability space. I just read on wiki well-ordering is a stronger case of total-ordering; but the differences elude me. Seems like I ought to read through the Terrence Tao source. – AdamO Mar 15 '18 at 19:49
1

I think well ordered means every subset has a least element while totally ordered means for all $x$ and $y$, exactly one of $x<y$, $x>y$, or $x=y$ holds, so $\mathbb N$ is both, $\mathbb R$ is just totally ordered, and $\mathbb C$ is neither. We absolutely need to multiply and add probabilities so at the very least the codomain of $P$ ought to be a field, but I don’t think it has to be totally ordered or complete. Complex valued measures are an example of the first and hyperreal valued measures are an example of the second. All of these are metric spaces though (or can be) – jld Mar 15 '18 at 21:46

Aksakal · Answer 3 · 2018-03-23T17:28:03.900

I'm sure you'll get good answers, but will give you a slightly different perspective here.

You may have heard mathematicians saying that physics is pretty much mathematics, or just an application of mathematics to the most basic laws of nature. Some mathematicians (many?) actually do believe that this the case. I've heard that over and over in university. In this regard you're asking a similar question, though not as wide sweeping as this one.

Physicist usually don't bother even responding to this statement: it's too obvious to them that it's not true. However, if you try to respond it becomes clear that the answer is not so trivial, if you want to make it convincing.

My answer is that physics is not just a bunch of models and equations and theories. It's a field with its own set of approaches and tools and heuristics and the ways of thinking. That's one reason why although Poincare developed relativity theory before Einstein, he didn't realize all the implications and didn't pursue to get everyone on board. Einstein did, because he was a physicist and he got what it meant immediately. I'm not a fan of the guy, but his work on Brownian motion is another example of how a physicist builds a mathematical model. That paper is amazing, and is filled with intuition and traces of thinking that are unmistakenly physics-ey.

So, my answer to you is that even if it were the case that probability deals with the kind of functions you described, it would still not have been the study of those function. Nor it is a measure theory applied to some subclass of measures. Probability theory is the distinct field that studies probabilities, it's linked to a natural world through radioactive decay and quantum mechanics and gases etc. If it happens so that certain functions seem to be suitable to model probabilities, then we'll use them and study their properties too, but while doings so we'll keep an eye on the main prize - the probabilities.

+1 for bringing reality to a math fight and actually answering the question with the only reasonable answer, i.e. that any such reductionism misses the point — jld, Mar 23 '18 at 17:02
@Chaconne I learned a useful word today reductionism, will incorporate it in my vocabulary :) — Aksakal, Mar 23 '18 at 17:28
+1, this is what I was trying to say with my answer, but I said it less effectively than you I think. — N. Virgo, Mar 24 '18 at 02:00
Arnold said that mathemathics can be defined as the branch of physics where experiments are cheap ... — kjetil b halvorsen, Jun 06 '22 at 04:47

score 4 · Answer 4 · answered Mar 14 '18 at 12:40

4

Well, partially true, it lacks a second condition. Negative probabilities do not make sense. Hence, these functions have to satisfy two conditions:

Continuous distributions: $$ \int_{\mathcal{D}}f(x) dx = 1 \quad \text{and} \quad f(x)>0 \; \forall x \in \mathcal{D}$$
Discrete distributions: $$ \sum_{x \in \mathcal{D}}P(x) = 1 \quad \text{and} \quad 0<P(x) \leq 1 \; \forall x \in \mathcal{D}$$

Where $\mathcal{D}$ is the domain where probability distribution is defined.

answered Mar 14 '18 at 12:40

Carlos Campos

694

Thanks a lot Carlos for the answer, actually I want to know what if the non negative condition was added? – dontloo Mar 14 '18 at 13:55
1

I would say that reducing probability field to study of probability density/mass functions (fulfilling the upper properties) is too bare. Moreover, as it has been stated by @AdamO, there are some cases of random variables which do not have probability density function, even though they have a well defined cdf. – Carlos Campos Mar 14 '18 at 14:33
1

@CarlosCampos: Regarding negative probabilities: They actually do make sense in some contexts, e.g. half coins. See https://en.wikipedia.org/wiki/Negative_probability for a bit more information. – Inkane Mar 14 '18 at 17:51

score 3 · Answer 5 · answered Mar 15 '18 at 05:42

I would say no, that's not what probability theory fundamentally is, but I would say it for different reasons than the other answers.

Fundamentally, I would say, probability theory is the study of two things:

Stochastic processes, and
Bayesian inference.

Stochastic processes includes things like rolling dice, drawing balls from urns, etc., as well as the more sophisticated models found in physics and mathematics. Bayesian inference is reasoning under uncertainty, using probabilities to represent the value of unknown quantities.

These two things are more closely related than they might at first appear. One reason we can study them under the same umbrella is that important aspects of both of them can be represented as non-negative functions that sum/integrate to one. But probability isn't just the study of those functions - their interpretation in terms of random processes and inference is also an important part of it.

For example, probability theory includes concepts such as conditional probabilities and random variables, and quantities such as the entropy, the mutual information, and the expectation and variance of random variables. While one could define these things purely in terms of normalised non-negative functions, the motivation for this would seem pretty weird without the interpretation in terms of random processes and inference.

Moreover, one sometimes comes across concepts in probability theory, particularly on the inference side, which cannot be expressed in terms of a non-negative function that normalises to one. The so-called "improper priors" come to mind here, and AdamO gave the Cantor distribution as another example.

There certainly are some areas of probability theory in which the main interest is in the mathematical properties of normalised non-negative functions, for which the two application domains I mentioned are not important. When this is the case, we often call it measure theory rather than probability theory. But probability theory is also - indeed, I would say mostly - an applied field, and the applications of probability distributions are in themselves a non-trivial component of the field.

You made the domain of topics in probability theory pretty narrow... — Tim, Mar 23 '18 at 17:18
@Tim not on purpose - I divided it into two areas, but intended each of them to be interpreted very broadly. Can you give me some other topics that don't fit under either heading? — N. Virgo, Mar 24 '18 at 01:58

Is probability theory the study of non-negative functions that integrate/sum to one?

5 Answers5

Linked