2

I am currently working on a not-that-easy problem involving order statistics. As I am unsure as to how I could solve it, I thought it might already possess a solution. So here I am, my questions is: do you know of a pre-existing solution/approximation to (or do you see an easy way of solving) the probability below? $$\pi = \Pr\left[G > c\right],$$ where $$ G_1 = \max_{1 \leq j \leq K} \left(Z_{(j)} - Z_{(j - 1)}\right), Z_{(0)} = 0, Z_j \sim N(\mu, \sigma^2) \text{ and } Z_i \perp Z_j, \forall i \ne j. $$ Just to clarify things up, $Z_{(j)}$ is the $j$th order statistics of $Z_1, Z_2, \ldots, Z_K$. Also, if it helps, the problem can actually be reformulated this way (but I would prefer the one above):

$$G_2 = \max_{1 \leq j \leq K} \left(\tfrac{Z_{(j)}}{Z_{(j - 1)}}\right), Z_{(0)} = 1, Z_j \sim N(\mu, \sigma^2) \text{ and } Z_i \perp Z_j, \forall i \ne j.$$

I understand that mathematically speaking, those expressions are not equivalent, but for $\mu \in (0, 1)$ and $\sigma$ small enough, it should not matter.

//=======================================================================//

ETA: OK, worked quite a bit on this damned problem, and found that for $\mu \in (0, 1)$, e.g. $\mu = 0.1$, and $\sigma$ small enough, e.g. $\sigma = 0.01$, $G$ actually simplifies to

$$G = \max_{1 \leq j \leq n} \left(Z_{(j)} - Z_{(j - 1)}\right) \approx Z_{(1)}.$$

Thus, the probability $\pi$ is easily computed

$$\pi = \Pr\left[G > c\right] \approx \Pr\left[Z_{(1)} > c\right] = {\left(1 - \Phi\left(\tfrac{c - \mu}{\sigma}\right)\right)}^K$$

The critical value $c$ is also easily extracted

$${\left(1 - \Phi\left(\tfrac{c - \mu}{\sigma}\right)\right)}^K = \alpha \quad \Leftrightarrow \quad c = \mu + \Phi^{-1} \left(1 - \alpha^{\tfrac{1}{K}}\right)\sigma.$$

All I got to do now (other than finding a more general solution to the above problem) is find values of $\mu$ and $\sigma$, which are context-dependent.

If anyone feels I should pile on and add some more details, please let me know. I am, after all, new to this forum.

RSMax
  • 21
  • 3
    There is something strange about these formulations of your problem, because clearly they are not equivalent. For instance, about half the $Z_{(j)}$ will be negative and in that range the ratios of order statistics will be less than unity, so the second formulation effectively ignores the first half of the gaps on average! Are you perhaps assuming in the second formulation that the $Z_j$ are lognormal? Also, could you explain why you include a constant $Z_{(0)}$ in the list of order statistics? – whuber Jan 24 '14 at 20:11
  • 1
    Well, technically you are right, but here it is, the variables $Z_j, j = 1, 2, \ldots, K$ are in truth proportions (whose support $\Omega$ is $(0, 1)$, which renders both formulation equivalent) that are asymptotically distributed as normal distributions (as stated by the central limit theorem).

    Also, I included the variable $Z_{(0)}$ because it was easier this way. Otherwise, I would have had to state $G$ as:

    $$G = \max \left(Z_{(1)}, \max_{2 \leq j \leq K} \left(Z_{(j)} - Z_{(j - 1)} \right)\right).$$

    Hope this clears things up.

    – RSMax Jan 25 '14 at 02:12
  • The underlying problem is certainly interesting; I don't recall seeing it for the normal case. – Glen_b Jan 25 '14 at 03:14
  • Note that $Z_{(1)}-0$ can be negative (and in large samples is almost certain to be negative!). Please edit your post to reflect your comment where you change the second definition of $G$. Please also change your notation so the two $G$'s are not referred to by exactly the same symbol (if you want to use $G$ for both, consider calling them $G_1$ and $G_2$ for example). – Glen_b Jan 25 '14 at 03:50
  • Dear Glen_b,

    I understand that $G_1$ (or $G_2$) could be negative under the normal approximation, but that's a cross we all must bear when using the central limit theorem.

    The exact underlying distributions would be nightmarish to compute and I know what my limits are.

    Of course, one could argue that the two first moments of the underlying proportions (upon which the variables $Z_j$ are by derived) are by no means 0 and 1, but, hey, maybe I still got some thinking to do... OK, coming right up!

    – RSMax Jan 25 '14 at 04:11
  • 3
    Largest/maximum spacing statistics are used in different statistical fields. An asymptotic result is e.g. http://www.jstor.org/discover/10.2307/2946462?uid=3737760&uid=2129&uid=2&uid=70&uid=4&sid=21103293232591 – Michael M Jan 25 '14 at 10:15
  • 2
    After your edit it's even less clear what you are doing. Your comments indicate that some kind of asymptotic result is needed, but the nature of the asymptotics is not in evidence. The question itself still is internally contradictory, because the distributions of $G_1$ and $G_2$ have no direct relationship to each other. – whuber Jan 25 '14 at 19:30
  • Dear whuber,

    I hear you, but still am confused about what I should do to remedy the situation. Should I start all over again, and in terms of Cross Validated, does that mean starting a new thread?

    Were I to remove $G_2$ from the OP, would that dissipate the ambiguity that seems to linger around?

    The truth now, the variables $Z_i$ are not normally distributed (they are proportion estimators, thus contained between $0$ and $1$), but, as stated by the central limit theorem, are asymptotically normal.

    – RSMax Jan 25 '14 at 20:06
  • All in all, the equality between $G_1$ and $G_2$ holds for the underlying proportions, and, in my humble opinion, using the normal approximation should not change that one bit.

    Hope this was a tiny bit more informative and precise.

    – RSMax Jan 25 '14 at 20:10
  • 2
    If they are reasonable proportion estimators, then they definitely are not asymptotically Normal: asymptotically, they converge to constants (the correct underlying proportion). You can obtain asymptotic Normality only by standardizing them. That process will push $Z_{(0)}$ out to $-\infty$, leading to a trivial (and useless) solution. – whuber Jun 24 '16 at 01:35

0 Answers0