3

Consider a sequence of random numbers drawn IID from some distribution $g(x)$. How would I determine the distribution of the value of the first sample from that sequence which is not greater than all previous samples? What about the special case where $g(x)$ is uniform over $[0, 1]$?

Example: Some example IID sequences drawn from $U(0,1)$ are

0.35, 0.29, 0.19, 0.64, ...
      ^^^^
0.33, 0.35, 0.22, 0.62, ...
            ^^^^
0.21, 0.01, 0.98, 0.11, ...
      ^^^^
0.59, 0.77, 0.93, 0.17, ...
                  ^^^^

The first non-monotonic samples are 0.29, 0.22, 0.01, 0.17. I'm interested in the distribution function for these values, either in the general or the $U(0,1)$ case.

ChrisB
  • 133

2 Answers2

2

Let $X_1,X_2,X_3,... \sim \text{IID } G$ be your exchangeable sequence of random variables and define $N \equiv \max \{ n \in \mathbb{N} | X_1 < X_2 < \cdots < X_n \}$, which is the length of the largest increasing portion at the start of the sequence. You are looking for the distribution of the random variable $X_{N+1}$, which is the first value that is not greater than all the previous values. This question is tractable if the distribution $G$ is continuous, but it gets much more complicated if the distribution has any discrete part, since that requires dealing with ties. For simplicity, I am going to show you the answer that holds when you have a continuous distribution.


Some preliminary work: We will use $G$ to denote the (continuous) distribution function and $g$ to denote the corresponding density function. In this related question it was established that the mass function for $n$ is:

$$p_N(n) = \mathbb{P}(N=n) = \frac{n}{(n+1)!} \quad \quad \quad \quad \text{for all } n \in \mathbb{N}.$$

Since $N=n$ requires that $X_1< \cdots <X_n \geqslant X_{n+1}$ (and the maximum is invariant to the order) we also have:

$$\mathbb{P}(X_n \leqslant r | N=n) = \mathbb{P}(X_i \leqslant r)^{n+1} = G(r)^{n+1},$$

which gives the corresponding density:

$$p(X_n = r|N=n) = (n+1) g(r) G(r)^n.$$


Finding the target density: Using the above results we have:

$$\begin{equation} \begin{aligned} p(X_{N+1} = x|N=n) &= \int p(X_{n+1} = x | X_n = r, N=n) \ dP(X_{n} \leqslant r | N=n) \\[6pt] &= \int \limits_x^\infty p(X_{n+1} = x | X_n = r, N=n) \cdot p(X_{n} = r | N=n) \ dr \\[6pt] &= \int \limits_x^\infty p(X_{n+1} = x | X_{n+1} \leqslant r) \cdot p(X_{n} = r | N=n) \ dr \\[6pt] &= \int \limits_x^\infty \frac{g(x)}{G(r)} \cdot (n+1) g(r) G(r)^{n} \ dr \\[6pt] &= g(x) \int \limits_x^\infty (n+1) g(r) G(r)^{n-1} \ dr \\[6pt] &= g(x) \Bigg[ \frac{n+1}{n} \cdot G(r)^{n} \Bigg]_{r=x}^{r \rightarrow \infty} \\[12pt] &= \frac{n+1}{n} \cdot g(x) (1 - G(x)^n). \\[6pt] \end{aligned} \end{equation}$$

Hence, we have:

$$\begin{equation} \begin{aligned} p(X_{N+1} = x) &= \sum_{n=1}^\infty \ p(X_{n+1} = x | N=n) \cdot \mathbb{P}(N=n) \\[6pt] &= \sum_{n=1}^\infty \ \frac{n+1}{n} \cdot g(x) (1 - G(x)^n) \cdot \frac{n}{(n+1)!} \\[6pt] &= g(x) \sum_{n=1}^\infty \ \frac{1 - G(x)^n}{n!} \\[6pt] &= g(x) \sum_{n=0}^\infty \ \frac{1 - G(x)^n}{n!} \\[6pt] &= g(x) ( e - e^{G(x)} ). \\[6pt] \end{aligned} \end{equation}$$

The corresponding distribution function is:

$$\begin{equation} \begin{aligned} \mathbb{P}(X_{N+1} \leqslant x) &= \int \limits_{-\infty}^x g(r) ( e - e^{G(r)} ) \\[6pt] &= \Bigg[ G(r) \cdot e - e^{G(r)} \Bigg]_{r \rightarrow - \infty}^{r=x} \\[6pt] &= \Bigg[ G(x) \cdot e - e^{G(x)} - (-1) \Bigg] \\[6pt] &= 1 + G(x) \cdot e - e^{G(x)}. \\[6pt] \end{aligned} \end{equation}$$


Special case - uniform distribution: In the case where we use the standard uniform distribution we have $g(x) = 1$ and $G(x) = x$ for all $0 \leqslant x \leqslant 1$ so we have:

$$\begin{equation} \begin{aligned} p(X_{N+1} = x) &= e - e^{x}. \\[6pt] \end{aligned} \end{equation}$$

Ben
  • 124,856
  • Thank you! This all looks correct except the equation for (=|=) -- I believe this is (n+1)g(r)G(r)^n (because X_n is the maximum of n+1 samples in this context, not n samples). This simplifies the overall result to a sum of 1/n! g(x) (1 - G(x)^n). I confirmed this result with Monte Carlo simulation for the uniform case. – ChrisB Apr 17 '19 at 18:17
  • and furthermore, for the uniform case, the result simplifies to e - e^x – ChrisB Apr 17 '19 at 18:23
  • Well spotted - I have edited the answer to correct this, and that does indeed simplify the mathematics considerably. – Ben Apr 17 '19 at 22:43
0

@Ben provides most of the solution, here's a summary that fixes a minor error:

The sequence we are interested in consists of N monotonic elements ($X_1 < X_2 < ... < X_N$) followed by a non-monotonic element at position N+1. We are interested in $p(X_{N+1}=x)$

The PDF for the length of the monotonic sequence at the start of the chain (excluding the first non-monotonic sample) is $p(N=n) = \frac{n}{(n+1)!}$

The PDF for the maximum value among the first N+1 elements (which, in this context, is also the value of the element at position N) is $p(X_n=r | N=n) = (n+1)g(r)G(r)^n$, where G is the cumulative distribution function corresponding to g.

The distribution for $X_{N+1}$ is $p(X_{N+1} = x) = \sum_{n=1}^{\infty}\int_x^{\infty}{p(X_{N+1}=x|X_N=r, N=n)p(X_n=r|N=n)p(N=n)dr}$

Using the fact that $X_{N+1} < X_{N}$, the first term is equivalent to

$\sum_{n=1}^{\infty}\int_x^{\infty}{p(X_{N+1}=x|X_{N+1} < r, N=n)p(X_n=r|N=n)p(N=n)dr}$

Which we can flip around using Bayes' rule:

$\sum_{n=1}^{\infty}\int_x^{\infty}{\frac{p(X_{N+1}<r|X_{N+1}=x, N=n)p(X_{N+1}=x|N=n)}{p(X_{N+1} < r | N=n)}p(X_n=r|N=n)p(N=n)dr}$

Finally, plugging in $g$ and $G$ based on the above identities:

$ \sum_{n=1}^{\infty}\int_x^{\infty} \frac{g(x)}{G(r)}(n+1)g(r)G(r)^n\frac{n}{(n+1)!} dr$

$ \sum_{n=1}^{\infty}g(x)\frac{1}{n!}\int_x^{\infty}ng(r)G(r)^{n-1} dr$

$ \sum_{n=1}^{\infty}\frac{g(x)}{n!}[1 - G(x)^n]$

For the case where $X \sim U(0,1)$, $g(x)=1$, $G(x)=x$, and

$p(X_{N+1})=\sum_{n=1}^{\infty}\frac{1-x^n}{n!} = e-e^x$

Which we can confirm against Monte Carlo simulation:

def sample(g = np.random.uniform):    
    x = top = g()
    while x >= top:
        top = x
        x = g()
    return x

def f(x):
    return np.e - np.exp(x)

plt.hist([sample() for _ in range(100000)], bins=100, density=True)
x = np.linspace(0, 1, 20)
y = np.array([f(xx) for xx in x])
plt.plot(x, y)

enter image description here

ChrisB
  • 133