A discrete distribution arising in a shorter sub-array problem

Question

I have a list $L=\{x_1,x_2,\cdots,x_N \}$ of $N$ random integer numbers $x_i\in[a,b]$, $b\gt a$, $a,b\in\mathbb{N}$, $N\gg b$ and they follow a discrete uniform distribution.

I need to scan the list from its beginning and "collect" each number from $a$ to $b$ and I am interested in how far in the list I had to go (reaching the index $F$, $b\leq F\leq N$) in order to collect the full set. I am not able to formulate the problem in a more formal way so please accept some examples.

$a=1$, $b=3$, $L=\{1,2,3\}=\{x_1,x_2,x_F\}$ $\Rightarrow F=3$
$a=1$, $b=3$, $L=\{3,2,1\}=\{x_1,x_2,x_F\}$ $\Rightarrow F=3$
$a=1$, $b=3$, $L=\{1,2,2,3,3,3\}=\{x_1,x_2,x_3,x_F,x_5,x_6\}$ $\Rightarrow F=4$
$a=1$, $b=3$, $L=\{1,1,1,1,2,3\}=\{x_1,x_2,x_3,x_4,x_5,x_F\}$ $\Rightarrow F=6$

I did some simulations in Python:

import random
import matplotlib.pyplot as plt
import numpy as np
a is fixed at 1
b = 3
M = 10000
def sim():
    s = set()
    cnt=0
    while(len(s)!=b):
        s.add(random.randint(1,b))
        cnt=cnt+1
    return cnt
F = []
for i in range(1,M+1):
    F.append(sim())
fig, ax = plt.subplots()
ax.hist(F,density=True,alpha=.5, color='grey',edgecolor='white',bins=np.arange(0, max(F) + 1.5) - 0.5,log=True)
ax.set_title("$a=1$, $b=$"+str(b)+", "+str(M)+" simulations")
ax.set_xlabel(str(min(F))+'$\leq F \leq$'+str(max(F)))
ax.set_ylabel('density')

and I get some fancy plot for the distribution of $F$:

Is it possible to theoretically derive the distribution of $F$?

What is the expected value for $F$?

What is the largest value for $F$? The worst case scenario for a fixed size list is that I need to reach the last value of $L$ at the end of the list in order to collect all the numbers... but what happen if $N\rightarrow \infty$?

What are the keywords to search for in order to find papers for this kind of problem? Or, what is a more suitable title for this question?

This is the coupon collector problem. We have an extensive set of threads on many variations of it, too. One detail is unclear based on your "largest value" question: what is the support of this uniform distribution? Your initial description makes it sound like all the numbers are randomly and independently drawn from $[a,b],$ which means you might not see the entire interval $[a,b]$ among your sample. — whuber, Sep 01 '22 at 14:03

A discrete distribution arising in a shorter sub-array problem

a is fixed at 1

0 Answers0