2

Using R language, I was mainly trying to understand if 0.25 quantile means value < 25 percentage of the values or value <= 25 percentage of the values

And similarly for 0.75 quantile

I tried the following code :

test <- c(1, 2, 3, 4, 5, 6, 7, 8)
quantile(test)

0% 25% 50% 75% 100% 1.00 2.75 4.50 6.25 8.00

I'm unable to explain why 25% is 2.75 and not 2 or 2.5 or 3

I checked the documentation https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile and found there are multiple algorithms used. However I don't really understand type=7, in simple terms. Could someone please explain this ?

I checked this question : Understanding The Algorithms Behind Quantile() in R but example used there seems to be different (meaning it results in a value from the numbers provided). Also based on the answer to that question "and the proportion greater than or equal to qp is at most 1−p", 7 numbers are greater than or equal i.e. 7/9 = 0.778, 1-1/4 = 0.75. 0.778 is not atmost 0.75. So the definition in that answer is not really correct ?

abhivij
  • 23

1 Answers1

2

The documentation you linked is almost self-explanatory: what you need is just substituting numeric values obtained from your data into the parameters as documented. In your case, for Type 7 linear interpolation: \begin{align} & n = 8, p = 0.25, m = 1 - p = 0.75, \\ & j = \lfloor np + m \rfloor = \lfloor 2 + 0.75\rfloor = 2, \\ & g = np + m - j = 2.75 - 2 = 0.75 = \gamma. \\ \end{align} which gives: \begin{align} Q_7(0.25) = (1 - \gamma)x_2 + \gamma x_3 = 0.25 \times 2 + 0.75 \times 3 = 2.75, \end{align} matching the R output.

While the above breakdown explains your confusion, it is more important to understand why R sets up so many different "types" for a sample quantile with the same $p$. This is because "value" that satisfies "value < 25 percentage of the values or value <= 25 percentage of the values" in your proposed statement is not unique due to the discreteness of data.

Zhanxiong
  • 18,524
  • 1
  • 40
  • 73