2

It is trivial to create a boxplot in R with a full dataset. However, with limited access to the whole dataset, I just have 5 data point at min, 25%, 50% ,75%, and max. So is there any easy way to reproduce the boxplot with only these 5 values?

Nick Stauner
  • 12,342
  • 5
  • 52
  • 110
divy
  • 21

1 Answers1

8

It's still pretty trivial. You can't reproduce the whiskers of a default boxplot effectively if the minimum and maximum values exceed Tukey's fences, but the box itself should remain unaltered. E.g., with x=rnorm(9999), compare boxplot(x) vs. boxplot(quantile(x)):

$\leftarrow$ full dataset vs. your five values $\rightarrow$

Nick Stauner
  • 12,342
  • 5
  • 52
  • 110
  • 2
    boxplot(fivenum(x)) is a lot shorter than boxplot(c(min(x),quantile(x,c(.25,.5,.75)),max(x))) (though if the quartiles don't match the definition of hinge in fivenum that might not be suitable) – Glen_b Apr 02 '14 at 05:20
  • Slick! Updated. – Nick Stauner Apr 02 '14 at 05:22
  • 1
    See the edit to my comment; your original has the advantage of being able to use any of the 9 definitions of quantiles. Then again, boxplot(quantile(x)) would work in place of fivenum and probably matches the original post better; I'm just used to associating boxplots with fivenum. – Glen_b Apr 02 '14 at 05:25
  • Wow. Never knew that one either! We'll go with that one then. – Nick Stauner Apr 02 '14 at 05:39
  • Yep. In large samples like that you can't see any difference between them, of course – Glen_b Apr 02 '14 at 05:47