6

What is the minimum "recommended" sample size to generate boxplots?

If I'm comparing different methods and each method has a different sample size, is it fine to use boxplots for this comparison? If not, what is the best way to compare methods with different sample sizes?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
shiny
  • 167

1 Answers1

5

[I thought I had written an answer to the first question but I can't locate one.]

With 5 or fewer observations, you might as well just plot the actual points.

It doesn't matter when comparing across samples that the samples aren't the same size, but if you have much larger samples in some groups you should see more points outside the ends of the whiskers on those.

what is the best way to compare between methods with different sample size

You might compare quantile plots, perhaps, or (as Nick Cox has suggested on at least one of his answers, but which I also can't locate right now -- edit: see here) you might combine such a plot with a boxplot by plotting the quantile plot under the boxplot.

Nick shows an example of a quantile plot here

Glen_b
  • 282,281
  • 2
    Other examples at http://stats.stackexchange.com/questions/114744/how-to-present-box-plot-with-an-extreme-outlier

    http://stats.stackexchange.com/questions/181501/how-to-use-boxplots-to-find-the-point-where-values-are-more-likely-to-come-from

    – Nick Cox Feb 10 '16 at 09:17
  • 1
    This issue usually comes up only when generating side-by-side boxplots and one or more of the groups is small. In such cases, for visual comparison, it may be advisable to draw boxplots even for groups of one! Also, have you considered including a visual representation of the group size, such as (a) making box widths proportional to the size or (b) including a notch or other device to indicate the standard error of the median? – whuber Feb 10 '16 at 14:25
  • 1
    https://www.nature.com/articles/nmeth.2813.pdf?origin=ppub suggests to make the box width proportional to the square root of the sample size. Of course, this won't be useful for extreme size differences. – Joachim Wagner Jul 05 '22 at 12:55
  • Sure, that's definitely a thing -- that suggestion has been around a very long time (I first saw it in the early to mid 80s but it comes from a paper Tukey was an author on in the 70s). Many packages can do this. I don't think this does a lot to help detect differences (per the question), though the notched boxplot (from the same article) does help with spotting location-shifts as long as we're not too fussy about what we mean by location and don't mind not-very-accurate significance levels. I have had some value from those, but relatively less from the variable-width one. – Glen_b Jul 05 '22 at 22:35
  • This paper: McGill, R., Tukey, J.W. and Larsen, W.A. (1978), "Variations of Box Plots." American Statistician, 32, 12-16. ... (edit ... oh, I see your linked article does reference that paper but omits titles; that makes it harder to spot.) – Glen_b Jul 05 '22 at 22:36