6

I am having hard time interpreting the relationship, if any, between conditional probabilities vs. conditional probability distributions, in particular, regarding the number of random variables required to be able to define these two concepts.

As I understand, conditional probability can be defined for events involving a single random variable. For example, we could ask what is the probability that a die roll resulted in 4 given that the number is larger than 3, i.e., $P(X=4 | X>3)$.

However, the conditional probability distribution is based on the concept of joint distribution, which requires two or more random variables.

My question is whether it makes sense to talk about the conditional probability distribution of a single random variable or, conversely, whether a conditional probability statement like the one above can be related to a conditional probability distribution?

Kavka
  • 493
  • 1
    Change $P(X=4|X>3)$. to $ P(X \le x|X>3)$ for (all) x, and now you have a conditional distribution function. Of course, this conditional distribution will be zero for all $x \le 3$. – Mark L. Stone Dec 08 '17 at 20:41
  • 2
    If your concept of conditional probabilities truly requires two random variables, then let $Y=X$ and define $\Pr(X=4\mid X\gt 3) = \Pr(X=4\mid Y \gt 3)$. – whuber Dec 08 '17 at 20:57
  • 1
    Or define a second random variable $Y=\mathbb{I}_{(3,\infty)}(X)$. – Xi'an Dec 08 '17 at 21:45
  • 1
    To get a better sense of where @Xi'an and I are coming from, see some of the better threads on conditional probability. They take various points of view, some of which do not need random variables at all. – whuber Dec 08 '17 at 22:19
  • Yes, I thought about defining a new variable $Y=X$ to end up with two random variables, but then $Y$ and $X$ become perfectly "correlated", which made it even harder for me to wrap my head around. – Kavka Dec 08 '17 at 22:41
  • @Kavka If you learned some advanced probability, one very motivation of introducing the concept of "conditional probability" is to deal with correlated cases. By its most generalized definition, the conditional probability "of any event $A$ given $X$" -- $P(A|X)$ (note though, here I am conditioning a r.v., not a fixed event as you wrote) is in fact a random variable. – Zhanxiong Nov 15 '22 at 04:17
  • In the elementary probability sense, once you fixed a positive-probability "conditioning event" (e.g., $X > 3$ in your question), then there is no essential difference (in calculation, as both of them collapsed to the defining formula $P(A|B) = P(A\cap B)/P(B)$) of "conditional probability" or "conditional probability distribution", except that people tend to think the latter as a function on $\mathbb{R}$ while treat the former as a value. – Zhanxiong Nov 15 '22 at 04:27
  • (continued) It is both legitimate to refer to $G(x) = P(Y \leq x | X > 3)$ and $F(x) = P(X \leq x | X > 3)$ as "conditional (probability) distribution". To be more precise, you can call $G(x)$ as "conditional distribution of $Y$ given $X > 3$", and $F(x)$ as "conditional distribution of $X$ given $X > 3$". Of course, as you said, to compute $G(x)$, you will need to know the joint distribution of $(X, Y)$, while it is sufficient to know the marginal distribution of $X$ to compute $F(x)$. – Zhanxiong Nov 15 '22 at 04:29

3 Answers3

1

A conditional probability "distribution" is essentially just a bunch of conditional probabilities, sufficient to fully characterise the conditional behaviour of one random variable given another event or random variable. A probability "distribution" can be characterised in various different ways (e.g., by a probability measure, mass/density function, CDF, generating function, etc.) and while there is no single mathematical object that is the distribution, we may refer to them as such as a shorthand.

In general, there are two main classes of mathematical objects which would characterise a "conditional probability distribution" and which we might refer to as such:

  • Conditional distribution at a given conditioning point: This is characterised by any function that fully characterises the probabilitic behaviour of $X$ conditional on a specific event $Y=y$.

  • Conditional distribution for any conditioning point: This is characterised by any function that fully characterises the probabilitic behaviour of $X$ conditional on any value of another random variable $Y$.

Let me illustrate this by example. Suppose we have two random variables $X$ and $Y$ and suppose we define the conditional cumulative distribution function (CDF):

$$F(x|y) \equiv \mathbb{P}(X \leqslant x | Y=y) \quad \quad \quad \text{for all } x \in \mathscr{X} \text{ and } y \in \mathscr{Y}.$$

The function $F( \cdot |y)$ for a fixed value of $y$ fully characterises the distribution of $X$ given the conditioning point $Y=y$, so we would consider this to be a "conditional distribution" in the shorthand sense previously described. The function $F( \cdot | \cdot)$ fully characterises the distribution of $X$ given any conditioning point for $Y$, so we would also consider this to be a "conditional distribution", again, in the shorthand sense. (The latter object is much more general, and it actually gives a whole bunch of conditional distributions, corresponding to each of the possible values for $Y=y$.)

Ben
  • 124,856
1

It seems to me we can use what we already know, provided we have heard of distribution functions and conditional probabilities. Thus, the following remarks offer nothing new, but I hope that in making them the basic simplicity and familiarity of the situation will become apparent.


When you have any real-valued random variable $X$ and an event $\mathcal E$ (defined on the same probability space, of course), then you can extend the definition of a (cumulative) distribution function in the most natural way possible: namely, for any number $x,$ define

$$F_X(x;\mathcal E) = \Pr(X\le x\mid \mathcal E).$$

When $\mathcal E$ has positive probability you can even avoid all technicalities and apply the elementary formula for conditional probability,

$$\Pr(X\le x\mid \mathcal E) = \frac{\Pr(X\le x\,\cap\,\mathcal E)}{\Pr(\mathcal E)}.$$

The numerator, which might look strange to the mathematically sophisticated reader, is the probability of the intersection of two events. The conventional shorthand "$X\le x$" stands for the set of outcomes where $X$ does not exceed $x:$ $\{\omega\in\Omega\mid X(\omega)\le x\}.$

This extends the usual distribution function very nicely in the sense that when $\Omega$ is the universal event (that is, the underlying set of all outcomes in the probability space), then since $(X\le x)\subseteq \Omega$ and $\Pr(\Omega)=1,$

$$F_X(x)= \Pr(X\le x) = \frac{\Pr(X\le x\,\cap\,\Omega)}{1} = \frac{\Pr(X\le x\,\cap\,\Omega)}{\Pr(\Omega)}= F_X(x;\Omega) .$$


Comments

Note that only one random variable $X$ is needed, showing that the concept of conditional distribution does not depend on a joint distribution. As a simple example, the right-truncated Normal distribution studied at Expected value of x in a normal distribution, GIVEN that it is below a certain value is determined by a Normally-distributed random variable $X$ and the event $X\le T$ (for the fixed truncation limit $T$).

Another example, just to make these distinctions very clear, models a population of people where we are interested in their sex and age (at a specified time, because both these properties can change!). By agreeing on a unit of measure of age (seconds, say), and (for simplicity) focusing on those people with a definite sex, we may take the sample space to be

$$\Omega = \{\text{male}, \text{female}\}\times [0,\infty).$$

Elements of $\Omega$ represent people. A sample from $\Omega$ could be represented by rows in a two-column table: one for sex, the other for age. That's what the Cartesian product $\times$ in the definition of $\Omega$ means.

The probabilities of interest will attach to intervals of ages for each sex separately (or combined). Thus, relevant events will be composed out of age intervals of the form $\{\text{male}\}\times (a,b]$ (for lower and upper ages $a$ and $b$ of males) and $\{\text{female}\}\times (a,b]$ (an interval of ages for females).

As a shorthand, "$\{\text{male}\}$" is the event $\{\text{male}\}\times [0,\infty) = \{(\text{male},x)\mid x \ge 0\},$ and similarly for "$\{\text{female}\}.$" By definition, these are both events -- or "subpopulations" if you like.

Let $X$ be the random variable giving the age of a person rounded to the nearest year. Then (for instance) we might be interested in $F_X$ (the distribution of all ages), of $F_X(\ \mid \{\text{male}\})$ (the distribution of male ages), or of $F_X(\ \mid \{\text{female}\}).$

This nice example shows that the conditioning event $\mathcal E$ (the sex) needn't have anything to do with $X$ (the age).

Clearly, this formulation of conditional distributions does not require us to define a random variable to condition on a characteristic like sex in the example. We could have done it that way, and there are some analytical and computational advantages to doing so, but conceptually such a construct would be artificial and superfluous.

When there are multiple random variables $(X,Y)$ (and $Y$ can be vector-valued), nothing new emerges because conditioning on $Y$ means conditioning on the events it defines.

whuber
  • 322,774
0

Well I think the term "conditional probability" or "conditional probability distribution" can both extend to two or more variables (correct me if I am wrong). For example, let us suppose we have $X,Y,Z$ are i.i.d random variables uniformly distributed on (0,1) for simplicity. Then we are required to find the conditional probability of $P\left(X \ge YZ|Y>\frac{1}{2}\right)$. This should be a valid question about finding conditional probability.

Then,

\begin{align}P\left(X \ge YZ|Y>\frac{1}{2}\right) &= \frac{P\left(X \ge YZ,Y>\frac{1}{2}\right)}{P(Y>\frac{1}{2})} \\&= \frac{\int_0^1\int_\frac{1}{2}^1\int_{yz}^1 1 dxdydz}{\frac{1}{2}}\\&= 2 \int_0^1\int_\frac{1}{2}^1(1-yz)dydz \\&= 2\int_0^1 \left(\frac{1}{2}-\frac{3z}{8}\right)dz\\&=2 \left( \frac{1}{2}-\frac{3}{16}\right)\\&=\frac{5}{8}\end{align}

Similarly, $$P\left(X \ge YZ|Y \le\frac{1}{2}\right)= \frac{\int_0^1\int_0^\frac{1}{2}\int_{yz}^1 1 dxdydz}{\frac{1}{2}} = \frac{7}{8},$$ as you can verify.

In a nutshell, the concept of conditional probability is not only valid on single variable case.

User1865345
  • 8,202
son520804
  • 461
  • 1
    I think you meant to say $Y < 1/2$ in your latter example. – Kavka Dec 14 '17 at 16:49
  • My question is the converse of the question that you've answered: "...whether it makes sense to talk about the conditional probability distribution of a single random variable", the same way we can do for conditional probabilities. – Kavka Dec 14 '17 at 16:52
  • Thanks Kavka~ Yes I meant for $Y< \frac{1}{2}$. Oh, I got it. Then you may consider the exponential or geometric distribution and the memoryless property. For exponential distribution, for example, you have $P(X \ge s+t | X \ge s) = P(X \ge t)$, for $t >0, s>0$. – son520804 Dec 14 '17 at 17:09