0

I get that the definition of the first quartile is the value of the data set in which it is greater than 25% of the data set; below I have defined a vector in R (which is already ordered)

datold = c(12,
32, 32, 32, 32, 32,
35, 35, 35, 35, 35, 35, 35, 
38, 38, 38, 38, 38, 38, 38, 38, 
42, 42, 42, 42, 42)

Notice that this set of data is already ordered from lowest to greatest and there are 26 elements in this data set; since the 1st quartile is the 25th percentile, I take 26/4, which is 6.5. In order to find the 1st quartile now, would it not be correct to take the average of the 6th and 7th value? That being $(32+35)/2 = 33.5$?

The value that R outputs when you do summary(datold) is 35. I'm just trying to understand the error in my understanding; any help would be appreciated.

Just editing in to add that the question from the textbook asks what the IQR of the data set is (and the answer provided was 3 which I assume they got from 38-35).

User1865345
  • 8,202
Wallace
  • 207
  • 1
  • 4
  • Please tell us the textbook. On the face of it asking for the IQR when there are just four distinct values is ignoring a detail that should not.be ignored! – Nick Cox Jul 24 '23 at 07:08
  • Also https://stats.stackexchange.com/questions/24112, https://stats.stackexchange.com/questions/134229, etc. – whuber Jul 24 '23 at 13:07

1 Answers1

4

There is no one single correct method to evaluate quartiles: any method generally would involve linear interpolation between two order statistics, with weights defined according to the working principle used, if I need to frame this in a very loose yet simple language.

Using quantile() yields

  0%  25%  50%  75% 100% 
12.0 35.0 36.5 38.0 42.0 

But what is actually happening is $\mathtt R$ is employing type $7$ linear interpolation to evaluate the quantiles.

> quantile(x, type = 7)
  0%  25%  50%  75% 100% 
12.0 35.0 36.5 38.0 42.0

If you use type $4$ linear interpolation, you would get

> quantile(x, type = 4)
  0%  25%  50%  75% 100% 
12.0 33.5 35.0 38.0 42.0

If you need more details as to what is transpiring behind those methods, I can readily expand, but that would be based primarily on the paper by Hyndman and Fan on this.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
User1865345
  • 8,202