3

I am currently studying the textbook In All Likelihood by Yudi Pawitan. Example 2.4 of chapter 2.2 Examples says the following:

Example 2.4: Suppose $x$ is a sample from $N(\theta, 1)$; the likelihood of $\theta$ is $$L(\theta) = \phi(x - \theta) \equiv \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} (x - \theta)^2}.$$ The dashed curve in Figure 2.3(d) is the likelihood based on observing $x = 2.45$.
Suppose it is known only that $0.9 < x < 4$; then the likelihood of $\theta$ is $$L(\theta) = P(0.9 < X < 4) = \Phi(4 - \theta) - \Phi(0.9 - \theta),$$ where $\Phi(x)$ is the standard normal distribution function. The likelihood is shown in solid line in Figure 2.3(d).
Suppose $x_1, \dots, x_n$ are an identically and independently distributed (iid) sample from $N(\theta, 1)$, and only the maximum $x_{(n)}$ is reported, while the others are missing. The distribution function of $x_{(n)}$ is $$\begin{align} F(t) &= P(x_{(n)} \le t) \\ &= P(X_i \le t, \ \text{for each $i$}) \\ &= \{\Phi(t - \theta)\}^n. \end{align}$$ So, the likelihood based on observing $x_{(n)}$ is $$L(\theta) = p_\theta (x_{(n)}) = n\{ \Phi(x_{(n)} - \theta)\}^{n - 1} \phi(x_{(n)} - \theta).$$

What is the reasoning behind the string of equality $P(x_{(n)} \le t) = P(X_i \le t, \ \text{for each $i$}) = \{\Phi(t - \theta)\}^n$?

User1865345
  • 8,202
The Pointer
  • 1,932
  • What part of it didn't you get? – User1865345 Jan 20 '23 at 17:47
  • @User1865345 This string of equality: $P(x_{(n)} \le t) = P(X_i \le t, \ \text{for each $i$}) = {\Phi(t - \theta)}^n$ – The Pointer Jan 20 '23 at 17:49
  • Do you understand the first equality? – jbowman Jan 20 '23 at 17:51
  • @jbowman I understand the reasoning behind the equality $F(t) = P(x_{(n)} \le t)$, if that's what you're referring to. – The Pointer Jan 20 '23 at 17:52
  • 3
    in words:theprobabilitythat the maximum is less than=t, is the probability that each x_i is less than equal t. they are independent, so the joint distribution is the product – seanv507 Jan 20 '23 at 17:52
  • So it seems you do understand it... – jbowman Jan 20 '23 at 17:52
  • In first year theory classes, many mathematicians become quite frustrated by what they might call "heuristic" arguments such as these treated as formal arguments. By contrast, first year analysis requires a delta-epsilon argument for each equality. Intuition counts for a lot in probability. – AdamO Jan 20 '23 at 18:41
  • Didn't you ask this exact same question a month ago? Making sense of this section on likelihood of order statistics – dipetkov Jan 21 '23 at 11:39
  • @dipetkov Even though that is the same section of the textbook, that question is about the multinomial argument that the author uses after the part in this question. So these two questions are about different parts of the same section. – The Pointer Jan 21 '23 at 18:24
  • I remembered the answer you got previously, which seems to me has concepts in common with the part of the question discussed here. So I think it would have been good to link to the other question and its answer. – dipetkov Jan 21 '23 at 18:36

2 Answers2

3

Preliminary note: There is inconsistency in the use of lower/upper case notation for random variables in the question. I have chosen to correct this in the answer so my capitalisation is different to the question.

The first equation comes from the fact that $X_{(n)} = \max \{ X_1,...,X_n \}$, which gives the logical equivalence:

$$X_{(n)} \leqslant t \quad \quad \iff \quad \quad \begin{matrix} X_1 \leqslant t, \\ X_2 \leqslant t, \\ \vdots \\ X_n \leqslant t. \\ \end{matrix}$$

In simple terms, this is saying that if the maximum of $X_1,...,X_n$ is no greater than $t$ then there cannot be any value in $X_1,...,X_n$ that is greater than $t$, and likewise, if there is no value in $X_1,...,X_n$ that is greater than $t$ then the maximum of $X_1,...,X_n$ is no greater than $t$. Once you impose this logical equivalence for the underlying events, you then have:

$$\mathbb{P}(X_{(n)} \leqslant t) = \mathbb{P}(X_1 \leqslant t,X_2 \leqslant t,...,X_n \leqslant t).$$


The second equation comes from the fact that the random variables $X_1,...,X_n$ were stipulated to be independent and identically distributed (IID) normal random variables with mean $\theta$ and unit variance, which means that:

$$\begin{align} \mathbb{P}(X_1 \leqslant t,X_2 \leqslant t,...,X_n \leqslant t) &= \prod_{i=1}^n \mathbb{P}(X_i \leqslant t) \\[6pt] &= \prod_{i=1}^n \mathbb{P}(X_i-\theta \leqslant t-\theta) \\[6pt] &= \prod_{i=1}^n \Phi(t-\theta) \\[12pt] &= [\Phi(t-\theta)]^n. \\[6pt] \end{align}$$

(The penultimate step here comes from the fact that $X_1-\theta,...,X_n-\theta \sim \text{IID N}(0,1)$ and $\Phi$ is the notation used for the CDF of the standard normal distribution.)

Ben
  • 124,856
1

The more general relation is

$$ \mathbb P[X_{i;n}\leq t]=\sum_{k=i}^n{{n}\choose {k}}F^k(t)[1-F(t)]^{n-k}$$

which has been derived in this CV post.

Breaking it down for $i=n,$\begin{align}\mathbb P[X_{n;n}\leq t]&=\mathbb P[X_i\leq t; ~\forall i=1,2,\ldots,n]\\&=\mathbb P[(X_1\leq t) \cap~(X_2\leq t) \cap\cdots\cap (X_n\leq t) ]\\&=\mathbb P[X_1\leq t] \times \mathbb P[X_2\leq t]\times\cdots\times\mathbb P[X_n\leq t]\\&=[F(t)]^n.\end{align}

User1865345
  • 8,202
  • Where's that relation from? It resembles the PMF of a binomial random variable, but we're dealing with normal random variables here. – The Pointer Jan 20 '23 at 17:55
  • 1
    Not in this case; the variable $x_i \leq t$ is a Bernoulli (0 - 1) variable, so the variable representing how many out of $n$ i.i.d. variates are less than $t$ is a Binomial variate. – jbowman Jan 20 '23 at 18:03
  • 1
    @ThePointer: You will find this (and much more) in any text on order statistics, see https://stats.stackexchange.com/questions/294640/what-book-is-best-to-learn-order-statistics – kjetil b halvorsen Jan 20 '23 at 18:22
  • 1
    Don't we have any post on order statistics that we can make a canon making such questions as duplicate of the former @kjetilbhalvorsen? Let me search a bit. – User1865345 Jan 20 '23 at 18:25
  • Why was this downvoted? – The Pointer Jan 20 '23 at 18:30
  • (-1) While this answer is technically true, I'm not sure how much help it is in explaining the equations for a user who is already having trouble with much simpler equations. I'm open to reconsidering the downvote, but my present view is that this just complicates the situation more, rather than simplifying it. – Ben Jan 20 '23 at 18:30
  • 1
    @Ben, I broke down the derivation step by step and used notations that are in convention. However I agree your post is explaining way better (+1). – User1865345 Jan 20 '23 at 18:35
  • @ThePointer, if you have any problem, do update me. I think it would be clearer now you have got two answers echoing the same thing. As of downvote, well, I wrote what I could do best. – User1865345 Jan 20 '23 at 18:37
  • 1
    @Ben I think your answer provides a better explanation overall, but I think it actually leaves out parts that are necessary for a fully clear explanation that this answer includes; in particular, the reasoning behind $\mathbb{P}(X_1 \leqslant t,X_2 \leqslant t,...,X_n \leqslant t) = \prod_{i=1}^n \mathbb{P}(X_i \leqslant t)$. Overall, I think both of these answers work well together to provide clarity. – The Pointer Jan 20 '23 at 18:39
  • 1
    On further consideration, I've decided to remove the downvote; I maintain my critique of the usefulness of this answer, but have decided that (+0) is more appropriate than (-1). – Ben Jan 20 '23 at 18:43
  • 1
    As I said, @Ben, yours is articulating way better than mine. I do accept your criticism for I can see your answer is more intuitive for a naive reader (+1 for that). But I would leave this rather than delete for at least it addresses the OP's query. – User1865345 Jan 20 '23 at 18:46
  • 1
    Sound like a good idea to me (and you now have a net upvote, so the present consensus is that this answer is useful). – Ben Jan 20 '23 at 18:53
  • @ThePointer, just another piece of advice, while I appreciate you studying the current book, you can consider reading other intro stat books like Casella, Berger; Rohatgi; Pestman along side it. – User1865345 Jan 20 '23 at 18:53
  • @User1865345 Casella and Berger (second edition) is my other textbook. But I don't remember seeing a section on order statistics. Does it contain such a section? – The Pointer Jan 20 '23 at 19:01
  • 1
    Well, there is. If you don't find, ping me. :-) @ThePointer. – User1865345 Jan 20 '23 at 19:02
  • 1
    @User1865345 Found it (page 226). I think I skipped this section during my formal statistics class, since it wasn't really covered in the content of the course. Anyway, thanks for reminding me. – The Pointer Jan 20 '23 at 19:11