3

Chi (2014, "The Value of Information and Dispersion") defines a RV $X^*$ by the continuous distribution of (footnote 7, pg. 6):

$$ G(x) = \begin{cases} (1-p) U_1 & \mbox{if } x=0 \\ 1-p & \mbox{if } x \in (0,1)\\ 1-p + p U_2 & \mbox{if } x = 1\end{cases}$$

where $U_1$ and $U_2$ are i.i.d. RVs, distributed uniformly on $[0,1]$.

He claims that $X^*$ is statistically equivalent to a Bernoulli random variable with success probability $p$.

How to think about $G(x)$? If $G(x)$ is not the CDF which describes $X^*$, what is $X^*$'s CDF? Is $g(x) = 0\ \forall x$? In order to compute an expected value, would I implicitly simply plug in expected values for $U_1$ and $U_2$?

bonifaz
  • 1,085
  • 1
    Your question is unclear, because this formula does not describe a CDF except in the case $U_2=1$, which has zero chance of occurring. You (or Chi) appear to have confused a CDF with a random variable. To avoid this problem it would be a good idea to transcribe the footnote exactly. – whuber May 13 '16 at 15:48
  • 1
    Thank you. I am still puzzling over what Chi might mean by "statistically equivalent." Does he define that anywhere in the paper? – whuber May 13 '16 at 16:36
  • No, can't find any definition. – bonifaz May 13 '16 at 16:43

1 Answers1

3

Chi's notation makes little sense. Fortunately, this construction comes from a 1988 paper of E. L. Lehmann, Comparing Location Experiments, where it is more clearly (if very briefly) described.

The concepts are interesting. I will describe Lehmann's idea of equivalent experiments, then illustrate and explain his construction.


"Statistical equivalence"

Lehmann begins with the idea of equivalent experiments. His language is so pithy and clear that I shall just quote some of the definitions.

An experiment $\mathbf{E}$ is a random quantity $X$ and a family $\mathbf{P} = \{P_\theta,\,\theta\in\Omega\}$ of possible distributions of $X$. Let $\mathbf{F} = (Y,\,\mathbf{Q}=\{Q_\theta, \theta\in\Omega\})$ be another experiment, with the distributions $P_\theta$ and $Q_\theta$ corresponding to the same state of nature $\theta$.

Lehmann gives four nearly equivalent characterizations of what it means for an experiment $\mathbf{F}$ to be more informative than an experiment $\mathbf{E}$. The importance of the idea is made plain by the second characterization:

For any decision procedure $\delta$ based on $X$ and any loss function $L(\theta,d)$ there exists a (possibly randomized) procedure $\delta^\prime$ based on $Y$ such that $R(\theta,\delta^\prime) = R(\theta,\delta)$ for all $\theta$.

(Lehmann assumes his readers will know that $R$ is the risk function for a state of nature $\theta$ using decision $d$: by definition, it is the expected loss of $d$.) Thus, we are never any worse off basing decisions on $Y$ instead of $X$ (and might, for some states of nature, actually be better off) in the sense of expected loss.

Two experiments are equivalent when each is more informative than the other. I believe this is what Chi intended "statistically equivalent" to mean.

Constructing $X^{*}$ from $X$

Suppose $X$ has a discontinuity at $x_0$, with $\Pr(X=x_0)=p$ when $\theta=\theta_0$. Define a new variable $X^{*}$ by $$\eqalign{X^{*}&= X, &\quad\text{if }X\lt x_0, \\ &= X + pU, &\quad\text{if }X = x_0, \\ &=X + p, &\quad\text{if }X \gt x_0,}$$ where $U$ is uniformly distributed on $(0,1)$.

[At pp 785-786.]

Let's work out the distribution of $X^{*}$ in terms of the distribution $F_X$ of $X$. To do so, notice $X$ is determined uniquely--without any randomness--in terms of $X^{*}$:

$$\eqalign{ X &= X^{*}, &\quad\text{if } X^{*} \lt x_0 \\ &= x_0, &\quad\text{if } x_0 \le X^{*} \lt x_0 + p \\ &= X^{*}-p, &\quad\text{if } X^{*} \ge x_0 + p. }$$

Figure: graph of the inverse

The graph of $X$ versus $X^{*}$ shows that $X$ is a continuous function of $X^{*}$, but it is not one-to-one. A total probability of $p$ is assigned to the interval from $x_0$ to $x_0+p$ for $X^{*}$. This causes the cumulative probability of $X$ to jump by $p$ at $X=x_0$, exactly as desired.

We can work out what this transformation does to the distribution $F_X$ to create the distribution $F_{X^{*}}$. They are the same for $x \lt x_0$. For $x_0 \le x^{*} \lt x_0+p$, $F_{X^{*}}$ increases with unit slope to cover the vertical jump of $p$ in $F_X$ at $x_0$. Thenceforth they agree once more, with $F_{X^{*}}(x^{*}) = F_X(x^{*}-p)$ when $x^{*} \ge x_0+p$.

Figure showing the transformation of the CDFs

Geometrically, the graph on the left is sliced vertically at $x=x_0$ and then pulled apart sideways by a distance $p$. That jump, shown with a dotted gray line, now stretches across at a unit slope, also shown with a dotted gray line at right. Consequently, $X^{*}$ has no jump at $x_0$.

By repeating this construction at each discrete jump of $X$ we eventually obtain a variable $X^{*}$ with no jumps: it is continuous. This is what Chi was trying to describe.

What the construction accomplishes

We saw earlier how the value of $X^{*}$ determines the value of $X$. Conversely, the value of $X$ usually determines one value of $X^{*}$, but wherever $X$ has a jump it is associated only with an interval of values of $X^{*}$ (and the specific value in that interval is determined by the random variable $U$). Thus, $X^{*}$ is more informative than $X$. However, since the additional information in $X^{*}$ tells us only about the values of $U$, which is independent of $X$, $X^{*}$ is no more informative than $X$: $X$ and $X^{*}$ are equivalent in Lehmann's sense.

The purpose of this construction is to avoid having to reason separately about discrete distributions. By applying this construction beforehand, all analysis may focus on continuous variables.

whuber
  • 322,774