5

I have an example from page 460 of the textbook Introduction to Probability (second edition) by Blitzstein and Hwang:

For example, suppose there are 14 people in a room. How likely is it that there are two people with the same birthday or birthdays one day apart? This is much harder to solve exactly than the birthday problem, so in Example 4.7.6 we used a Poisson approximation. But we may want a guarantee from a bound rather than worrying about whether the Poisson approximation is good enough. Let $X$ be the number of “near birthday” pairs. Using indicator r.v.s, we have $E(X) = \binom{14}2 \dfrac{3}{365}$. So

$$P (X = 0) \le \dfrac{1}{E(X) + 1} < 0.573$$

The true answer for $P(X = 0)$ turns out to be $0.46$ (to two decimal places), which is consistent with the bound.

How did the authors find that the true answer for $P(X = 0)$ is $0.46$? How do we do this? Is it something that must be calculated computationally using something like R? Thank you.

Dom Fomello
  • 213
  • 2
  • 8

1 Answers1

5

Analytical Solution

It turns out there's an analytical solution for the near-birthday problem.

Let's consider the example with 14 people in 365 days and calculate the probability of having no pair with birthdays on the same day or one say apart.

Let's assume the first person has a birthday on a fixed day, January 1st, which we'll call $b_1$. We'll later generalize that initial date.

Then we can distribute the 13 other birthdays $b_2$ to $b_{14}$ through the other 364 days, leaving 14 intervals of days, $a_1$ to $a_{14}$, which we expect to have at least one day. The sum of the intervals will be $365 - 14 = 351$ days.

diagram birthdays and intervals

So we have:

$$ \left\{ \begin{aligned} \sum_{i=1}^{14} a_i &= 351 \\[2ex] a_i &\ge 1 \\[2ex] \end{aligned} \right. $$

This is a familiar problem, a 14-part composition of 351. We can calculate those using a stars and bars technique using 13 bars in the 350 locations between the 351 stars. The total number of 14-part combinations of 351 is:

$$ {350}\choose{13} $$

If we consider the 13 birthdays in any order, then we have a total number of permutations of:

$$ {{350}\choose{13}} \times 13! = {}_{350}\mathrm{P}_{13} $$

All that's missing is to generalize the first birthday. In order to have it on any date, we can simply multiply the number of permutations by 365, by shifting the whole set of birthdays by a number of days going from 0 to 364:

$$ 365 \times {}_{350}\mathrm{P}_{13} $$

That's the total number of permutations of birthdays for 14 people, respecting order, where no pairs of birthdays are within one day of each other.

The total number of ways for 14 people to have birthdays in 365 days is $365^{14}$.

So the probability that in a room of 14 people, no two of them share a near-birthday is:

$$ \frac{{}_{350}\mathrm{P}_{13}}{365^{13}} = 0.46250721196335653$$

This matches the exact result you were expecting of 0.46 to two decimal places.

 

We can generalize this formula. In an year of $n$ days, in a group of $k$ people, the probability that none of them have near-birthdays is:

$$\frac{{}_{n-k-1}\mathrm{P}_{k-1}}{n^{k-1}}$$

We can further look at clashes of birthdays at least $d$ days apart, by having the $k$ intervals between them have be $\ge k$. We can reduce this back to the compositions problem by adding $(d-1)$ to the $k$ intervals, which is the same as subtracting $k(d-1)$ from the target value. Putting it all together and simplifying, we get that the probability that none of them have birthdays at most $d$ days apart is:

$$\frac{{}_{n-kd-1}\mathrm{P}_{k-1}}{n^{k-1}}$$

(I checked this formula for some examples that are easy to calculate by hand, such as 2 and 3 people at most one day apart, the 0.46 result matches the one from the book too. Checked limits when the probability is expected to drop to zero when it's impossible to fit as many people as days. This formula also gracefully degrades to the actual "birthday problem" when $d$ is zero.)

See also: The "Near-miss" Birthday Problem, by Gregory Quenell

 

 

Authors' Claim and Poisson Approximation

It's unclear to me why the authors used this particular example to explain the merits of using a distribution as an approximation or to set bounds to a specific probability.

By "much harder to solve exactly than the birthday problem", does that mean they didn't know about this solution to this problem? Did they expect it needed a computational solution? Did they use a brute-force approach to calculate the exact 0.42 probability for that specific case?

Or did they mean that even though an analytical solution is possible, it may be hard to reason about it, hard to figure out all the details, think about all the corner cases, double-check all the math, etc. In such cases, finding a rough approximation through a distribution can be quite useful.

 

For an example of a problem that requires a computational approach to calculate an exact probability, see number of names per draw to draw 90% of 200 names in 30 draws, with 90% probability, where an analytical formula is most likely not to be found and iterative calculations are needed to find the exact probability. But, in that particular case, the probability closely follows a binomial distribution, which makes it possible to quickly get approximate results that are quite useful. The approximation is so close that it makes a lot of sense to just use it in most cases.

  • So this needs to be calculated computationally? – Dom Fomello Feb 22 '20 at 19:36
  • @DomFomello Well, it surely can be solved computationally. That's likely what the author means by "much harder to solve exactly than the birthday problem". Even if you manage to find some shortcuts, for example calculate the last few levels together, you still end up with too many terms to calculate it by hand... – filbranden Feb 22 '20 at 20:41
  • In that case, that’s what I wanted someone to show: a computational solution. – Dom Fomello Feb 23 '20 at 04:55
  • @DomFomello I actually found an analytical solution to this problem. Reworked my whole answer to reflect that. Please take another look. – filbranden Feb 23 '20 at 23:47
  • 1
    Perhaps I am misunderstanding the argument, but this purported analytic solution is not convincing to me. If the second person has a birthday on 30 December then this only removes one adjacent day instead of two, and so on. That dependency does not seem to appear in your calculation. Can you explain? – Ben Feb 24 '20 at 00:18
  • @ReinstateMonica I just added a better explanation, similar to a stars and bars technique, we're splitting the remaining 364 days in 14 partitions of at least length 1 (the spans between birthdays) only the bars splitting the partitions are birthdays, so we subtract them from the 364 initial days. – filbranden Feb 24 '20 at 02:31
  • 1
    Okay, the link you supplied to Quenell has me convinced it is correct. However, this document, with the visual depiction of the gap sequence, is much clearer than your explanation. Perhaps you might consider adding that diagram, and writing your answer to reference it? – Ben Feb 24 '20 at 02:44
  • @ReinstateMonica Yeah that's fair, I'll rework that one more time. (Just give me some time to go through it.) One question is now bothering me though: Why did the authors of the book mentioned this problem as "much harder to solve exactly than the birthday problem"? The formula in the end is quite simple... This is actually at the core of the OP's question. Do you have an insight into why one would want to use a Poisson approximation in this case? – filbranden Feb 24 '20 at 03:06
  • 1
    Thanks for the edit. I started reading your edited answer, but there were a couple of things that I noticed at the beginning: 1. You begin by considering 14 people in 365 days, having no pair with birthdays on the same day or one day apart. But for the near-birthday problem, we're actually only interested in birthdays that are on the same day or one day apart. Now, what I suspect might be happening here is that you're purposely using this definition to encompass everyone who doesn't satisfy the near-birthday problem statement, to then use in solving the near-birthday problem? [...] – Dom Fomello Feb 25 '20 at 11:41
  • 1
    [...] 2. Immediately after the statement in 1., you assume that the other 13 need to have birthdays from January 3rd until the end of the year. But this violates your assumption in 1., since someone with a birthday on the last day of the year would be one day apart from the person who has their birthday on January 1? – Dom Fomello Feb 25 '20 at 11:43
  • 1
    @DomFomello 1. Correct, I'm calculating the cases where there are no matches. 2. Not really, since for each birthday I'm also taking the next day, I'm taking a block of two days. But I agree the explanation is confusing, so I'll reword it one more time. See the first link on a bullet for a better explanation of that same idea, for now. – filbranden Feb 25 '20 at 16:52
  • @filbranden Sorry, you're right. What I said in 2. even contradicts what I said in 1., so I must have confused myself. – Dom Fomello Feb 25 '20 at 20:22
  • 1
    @Ben-ReinstateMonica Alright, I reworked this one once again! Included a diagram with the intervals and explained it using the 14-part composition of the number of days that makes the intervals. Hopefully it's all clear now. Thanks a lot for your great input!!! – filbranden Feb 26 '20 at 06:08