0

I am trying to understand the Q–Q plot. Suppose I create a sample according a exponential distribution

> n=10
> v = rexp(n)
[1] 4.20905976 0.03434381 0.45344443 0.90680895 1.08972583 0.76248370 0.30751413 2.59427493 0.25876480 0.74835029
sort(v)
[1] 0.03434381 0.25876480 0.30751413 0.45344443 0.74835029 0.76248370 0.90680895 1.08972583 2.59427493 4.20905976

Consider

qqnorm(v, pch=20, col="blue")
qqline(v, col="red", lwd=2)

I would like to know how I should interpret each point in relation to the horizontal axis. I perfectly understand the relationship of each point with respect to the vertical axis, since I order the vector v. But I can't do the reading in relation to the quantiles of the normal N(0,1). For example, in the graph below I have highlighted with a blue circle the second largest point of the vector v which corresponds to 2.59427493, as can be seen when we look at the ordinate axis. And what would be the reading or interpretation in relation to a value close to 1.0 of the abscissa axis? Remembering that I understand that the abscissa axis represents the quantiles of normal N(0,1). enter image description here

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
PSE
  • 256
  • 2
    Quantile plots with very small samples can't do magic any more than any other method. But the plot is signalling that a sample from an exponential isn't a good fit to a normal -- as the overall curvature shows. That seems what you would expect. Perhaps the key point is that the pattern of points needs to be considered as a whole. The set of theoretical quantiles are all interdependent insofar they are all based on a normal distribution with the same mean and standard deviation, but the software is just using mean 0 and SD 1: it's not scaling to have the same SD as the data. – Nick Cox May 14 '22 at 09:13
  • 2
    .. the same mean and SD as the data. – Nick Cox May 14 '22 at 10:41
  • The recipe for interpreting the qqplot I understand. I just wanted to understand what is the Cartesian relationship between the points (even though they are many) and the abscissa axis. The interpretation of the whole I can understand well and I can decide whether or not the distribution is a normal one, but that is not my concern. – PSE May 14 '22 at 16:20
  • 2
    Ideally, the horizontal locations are the expected values of order statistics from an iid Normal sample. Because computing those values (for samples greater than size 5, anyway) requires numerical integration, usually approximate values are used. For all except the two extremes these are quantiles of the standard Normal distribution. Various approximate methods exist for the two extreme locations. Consult your software for details. – whuber May 15 '22 at 14:21
  • I still don't understand the sample's relationship with the x axis. I think I understand well that the horizontal axis represents the quantiles of a normal N(0,1), I just don't know how to read this with the data from the sample v. Perhaps a more specific question is, let's forget about the y-axis and I would like to analyze just the plotted points and their relationship to the x-axis. What does he tell me? – PSE May 15 '22 at 17:04
  • 1
    The position of the $i^{\text{th}}$ point from the left approximates the expected value of the $i^{\text{th}}$ smallest value in a sample of $n=10$ standard Normal variables. A formula is $$\frac{n!}{(i-1)!(n-i)!}\int_{\mathbb R}z,\Phi(z)^{i-1}\left(1-\Phi(z)\right)^{n-i},\mathrm{d}\Phi(z)$$ where $\Phi$ is the standard Normal cdf, $$\Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^z e^{-z^2/2},\mathrm{d}z.$$ For any sizable value of $n$ and $i\ne 1,$ $i\ne n,$ this is close to $\Phi^{-1}((i-a)/(n+1-2a)$ where often $a = 3/8$ is used. See https://stats.stackexchange.com/a/193246/919. – whuber May 16 '22 at 13:05
  • Yes, thanks for the answers! – PSE May 17 '22 at 00:26

0 Answers0