Riddler puzzle - distance from origin after two random jumps of equal length

Question

I love the FiveThirtyEight Riddler puzzle. I try to do it as much as possible, but I'm not really trained in maths or stats. This weeks puzzle has you calculating the expected distance of two random "jumps" in a 2D plane. The jumps can be any direction (0-360 degrees) and are always the same length $r$. The problem asks what is the expected distance from the starting point?

I simplified this problem to start with the second jump (since the first will always be distance $r$). I tried to perform a change of variable to the probability function:

$$P[\Theta] = \frac{\Theta}{2\pi},$$

using the function:

$$L = 2 r \sin \left(\frac{\theta}{2}\right) \quad \quad \implies \quad \quad \theta=2 \text{arcsin}\left(\frac{L}{2r}\right),$$

where $L$ is the length from the starting point (see figure at bottom of post for my depiction of the problem). After I do the re-arranging and take the Jacobian of the new function, the probability function I end up with is: $$P(L) = \frac{2 \text{arcsin}\left(\frac{L}{2r}\right)}{\pi\sqrt{1-\frac{L^2}{4r}}}.$$

I multiply by 2 since the function is symmetric about 180-degrees. However, if I integrate over this function from $0$ to $2r$, the result is $\pi/2$, not one, as I would expect. If I simulate the jumps and plot the simulations over this probability density function, they are "similar" but far from identical. Where am I going wrong with calculating the probability density function?

Ben · Accepted Answer · 2021-09-07T22:45:32.957

The original 'riddler' problem at FiveThirtyEight appears to be asking about the mode of the distribution rather than its expected value. In any case, if we can find the distribution then we can find both the mode and the expected value.

Your initial trigonometric expression for the length looks wrong to me, and I think that is the problem with your approach. You can see from your plot that the problem is not just the constant-of-integration, since your posited density line is not proportionate to the simulated density.

I will show you how to solve this using the correct trigonometric rules. If we let $\theta$ be the angle between the two directions of the jumps then we can apply the law of cosines to determine that the total distance from the starting point after both jumps is:

$$\begin{align} T &= \sqrt{r^2 + r^2 - 2 \cdot r \cdot r \cos \theta} \\[8pt] &= \sqrt{2 r^2 - 2 r^2 \cos \theta} \\[6pt] &= r \sqrt{2 (1-\cos \theta)} \\[6pt] &= r \sqrt{2 (1-\cos |\theta|)}. \\[6pt] \end{align}$$

Now, since $|\theta| \sim \text{U}(0, \pi)$, for all $0 \leqslant t \leqslant 2r$ we have:

$$\begin{align} F_T(t) \equiv \mathbb{P}(T \leqslant t) &= \mathbb{P} \bigg( \sqrt{2 (1-\cos |\theta|)} \leqslant \frac{t}{r} \bigg) \\[6pt] &= \mathbb{P} \bigg( 2 (1-\cos |\theta|) \leqslant \frac{t^2}{r^2} \bigg) \\[6pt] &= \mathbb{P} \bigg( \cos |\theta| \geqslant 1 - \frac{t^2}{2 r^2} \bigg) \\[6pt] &= \mathbb{P} \bigg( |\theta| \leqslant \text{arccos} \Big(1 - \frac{t^2}{2 r^2} \Big) \bigg) \\[6pt] &= \frac{1}{\pi} \cdot \text{arccos} \Big(1 - \frac{t^2}{2 r^2} \Big). \\[6pt] \end{align}$$

Differentiating this expression then gives the corresponding density function:

$$\begin{align} f_T(t) \equiv \frac{d F_T(t)}{d t} &= \frac{1}{\pi} \cdot \frac{2t/2r^2}{\sqrt{1-(1-t^2/2r^2)^2}} \\[6pt] &= \frac{1}{\pi} \cdot \frac{2t/2r^2}{\sqrt{t^2/r^2 - t^4/4r^4}} \\[6pt] &= \frac{1}{\pi} \cdot \frac{2}{\sqrt{4 r^2 - t^2}}. \\[6pt] \end{align}$$

The mode of this distribution occurs at $t=2r$, which solves the original 'riddler' problem. With a bit of extra work you can show that the mean and standard deviation of the distribution are:

$$\begin{align} \mathbb{E}(T) &= \frac{4}{\pi} \cdot r \approx 1.27324 r \\[6pt] \mathbb{S}(T) &= \frac{\sqrt{2(\pi^2-8)}}{\pi} \cdot r \approx 0.6155169 r \\[6pt] \end{align}$$

Incidentally, this distribution is essentially a folded-arcsine distribution (relating closely to the arcsine distribution). It can be characterised in an alternative way by taking $X \sim \text{arcsine}$ and then taking $T = 4r |X-\tfrac{1}{2}|$. (The interested reader may confirm that this transformation gives the same distribution shown above.)

Another interesting aspect of this problem is to ask what happens to the distribution as we add more random jumps. It turns out that the mathematics gets rather nasty once we go beyond two jumps, and you end up with density formulae that are defined recursively using expressions that cannot be put into closed form. If we were to look at increments of a large number of jumps, the displacement from the origin should act like a kind of Brownian motion process, and so in the limit the distribution will converge to an appropriately scaled normal distribution (something that we can easily demonstrate using the CLT).

Confirming this result by simulation: We can confirm this result by simulation in R. First we will program a function SIMULATE to simulate the jumps and compute the total displacement from the starting point.

SIMULATE <- function(n, r = 1) {
#Generate two jumps in random direction
  ANGLE1 <- 2pirunif(n)
  ANGLE2 <- 2pirunif(n)
  JUMP1x <- rsin(ANGLE1)
  JUMP1y <- rcos(ANGLE1)
  JUMP2x <- rsin(ANGLE2)
  JUMP2y <- rcos(ANGLE2)
#Determine total displacement and distance
  TOTALx <- JUMP1x + JUMP2x
  TOTALy <- JUMP1y + JUMP2y
  DIST   <- sqrt(TOTALx^2 + TOTALy^2)
#Return the distance values
  DIST }

Now we will confirm that our density function matches the result from a large number of simulations.

#Generate the true density function
DENSITY <- function(x, r = 1) {
  n <- length(x)
  OUT <- rep(0, n)
  for (i in 1:n) {
    if ((x[i] >= 0)&(x[i] <= 2*r)) { OUT[i] <- 2/(pi*sqrt(4*r^2-x[i]^2)) } }
  OUT }
#Generate histogram of simulations
set.seed(1)
SIMS <- SIMULATE(10^6)
hist(SIMS, freq = FALSE, breaks = 100, col = 'LightBlue',
     xlab = 'Distance', main = 'Simulation versus True Density')
XX <- seq(0, 1.999, length = 100) 
YY <- DENSITY(XX)
lines(XX, YY, col = 'Blue', lwd = 3)

Demetri Pananos · Answer 2 · 2021-08-29T04:07:26.007

That puzzle was asking for the mode, not the expected distance. Here is how I solved it.

Without loss of generality, assume that the cricket starts at (1,0) and jumps to the origin. The question is then about the density of distances from (1,0) to any point on the unit circle.

Let $\theta \sim \operatorname{U}(0, 1)$, so $p(\theta) = 1$. We are asked about the mode of the distribution of

$$l^2 = 2 - 2\cos(\pi \theta) \implies l = \sqrt{2(1-\cos(\pi \theta))} $$

Here, I have used the law of cosines and the fact that the jump distance is exactly 1 unit long and am only considering the top half of the circle (hence $\pi \theta$) so that $l$ is a monotonically increasing function on [0,1]. We know the distribution of $\theta$, so we can use some well known results from probability theory to derive the distribution of $l$.

$$g(\theta) = 1$$

$$f(l) = g(\theta(l)) \Big\vert \dfrac{d\theta}{dl} \Big\vert $$

$$ \theta(l) = \dfrac{1}{\pi}\arccos \Big( \dfrac{l^2}{2}-1 \Big) $$

$$ \Big\vert \dfrac{d\theta}{dl} \Big\vert = \frac{2 l}{\pi \sqrt{(2l-l^2)(2l+l^2)}} $$

So the density for $l$ is "simply"

$$f(l) = \frac{2 l}{\pi \sqrt{(2l-l^2)(2l+l^2)}}$$

We can plot this and we notice that the majority of the probability is concentrated around 2

So although there is a discontinuity at 2, it appears that 2 is the mode. Additionally, this is a proper probability density

$$ \int_0^2 f(l) \, dl = \theta(2) - \theta(0) = \dfrac{1}{\pi} \pi - \dfrac{1}{\pi}0 = 1 $$

Another eloquent way to consider this problem is to plot points around the circle and color them by their distance like so

If we plot these along the unit interval, we notice that the the points are quite literally most dense around 2

Credit where credit is due: Thank you to Corey Yanofsky for correcting my math on this and reminding me that this change of variables approach only applies when $l(\theta)$ is monotic. Thank you to Jake Westfall (who is an active member on this forum) for the very clever plots of dots around the unit the circle. That was all his idea, not mine.

(+1) Your density function (and Jacobian) can be further simplified to $f(l) = 2/(\pi \sqrt{4-l^2})$. — Ben, Aug 29 '21 at 22:03
(It is also generally bad practice to start the axis of a figure at a non-zero value; it gives a misleading impression of the shape of the function.) — Ben, Aug 30 '21 at 02:10

Riddler puzzle - distance from origin after two random jumps of equal length

2 Answers2

Linked